Cangjie Programming Language Documentation
Cangjie programming language is a general programming language for full-scene application development, balancing development efficiency and runtime performance, and providing a good programming experience.
Introduction to Cangjie Language
Cangjie is a general-purpose programming language designed for full-scenario application development, balancing development efficiency with runtime performance while providing an excellent programming experience. Its key features include:
- Multi-Backend Support: Cangjie supports two backends: CJNative and CJVM. The CJNative backend compiles code into native binaries that run directly at the operating system level, while the CJVM backend compiles code into bytecode that runs on a VM (Virtual Machine). This documentation is adapted for the CJNative backend.
- Concise and Efficient Syntax: Cangjie offers a series of concise and efficient syntax features aimed at reducing redundant coding and improving development efficiency. Examples include interpolated strings, primary constructors, Flow expressions,
matchstatements, and re-exports, allowing developers to express logic with minimal code. - Multi-Paradigm Programming: Cangjie supports functional, imperative, and object-oriented programming paradigms. It incorporates advanced functional language features such as higher-order functions, algebraic data types, pattern matching, and generics, alongside object-oriented language features like encapsulation, interfaces, inheritance, and subtype polymorphism for modular development. It also includes imperative language features such as value types and global functions for simplicity and efficiency. Developers can choose the paradigm that best suits their preferences or application scenarios.
- Type Safety: Cangjie is a statically and strongly typed language, enabling early detection of program errors through compile-time type checking, reducing runtime risks, and facilitating code maintenance. Additionally, the Cangjie compiler provides robust type inference capabilities, minimizing the need for type annotations and enhancing development efficiency.
- Memory Safety: Cangjie supports automatic memory management and performs runtime checks for array bounds, overflow, and other operations to ensure memory safety during execution.
- Efficient Concurrency: Cangjie provides lightweight user-mode threads (native coroutines) and an easy-to-use concurrency programming mechanism, ensuring efficient development and execution in concurrent scenarios.
- Language Ecosystem Compatibility: Cangjie supports interoperability with languages like C and adopts a convenient declarative programming paradigm, enabling efficient reuse and compatibility with libraries from other languages.
- Easy Domain-Specific Extensibility: Cangjie offers metaprogramming capabilities based on lexical macros, allowing code transformation at compile time. It also includes features such as trailing
lambda, attributes, operator overloading, and optional keywords, enabling developers to deeply customize syntax and semantics. This facilitates the construction of Embedded Domain-Specific Languages (EDSLs). - Enhanced UI Development: UI development is a critical aspect of building end-side applications. Leveraging Cangjie’s metaprogramming and trailing
lambdafeatures, users can create declarative UI development frameworks to improve efficiency and experience in UI development. - Rich Built-in Libraries: Cangjie provides a comprehensive set of built-in libraries covering data structures, common algorithms, mathematical computations, regular expressions, system interactions, file operations, network communication, database access, logging, compression/decompression, encoding/decoding, encryption/decryption, and serialization.
Installing the Cangjie Toolchain
When developing Cangjie programs, one essential tool is the Cangjie compiler, which can compile Cangjie source code into executable binary files. However, modern programming languages come with more than just compilers. In fact, Cangjie provides developers with a comprehensive suite of development tools, including compilers, debuggers, project managers, static analysis tools, formatting tools, and coverage statistics tools, all designed for a seamless “out-of-the-box” experience.
Currently, the Cangjie toolchain has been adapted for certain versions of Linux, macOS, and Windows platforms. However, full functional testing has only been conducted on select Linux distributions. For details, please refer to the appendix section Support and Installation of Linux Version Toolchain. On platforms that have not undergone full functional testing, the completeness of the Cangjie toolchain’s functionality is not guaranteed. Additionally, the current Windows version of the Cangjie compiler is implemented based on MinGW and may lack some features compared to the Linux version.
Linux / macOS
Environment Preparation
Linux
The system requirements for the Linux version of the Cangjie toolchain are as follows:
| Architecture | Environment Requirements |
|---|---|
| x86_64 | glibc 2.27, Linux Kernel 4.15 or later, with libstdc++ 6.0.24 or later installed |
| aarch64 | glibc 2.27, Linux Kernel 4.15 or later, with libstdc++ 6.0.24 or later installed |
For Ubuntu 18.04, additional dependency packages must be installed:
$ apt-get install binutils libc-dev libc++-dev libgcc-7-dev
For dependency installation commands for other Linux distributions, please refer to the appendix section Support and Installation of Linux Version Toolchain.
Additionally, the Cangjie toolchain depends on the OpenSSL 3 component, which may not be available in the default repositories of the above distributions. Therefore, it must be manually installed. For installation instructions, please refer to the appendix section Support and Installation of Linux Version Toolchain.
macOS
The macOS version of the Cangjie toolchain supports macOS 12.0 and later.
Before using the macOS version, the following dependency package must be installed by executing the following command:
$ brew install libffi
Installation Guide
First, visit the official Cangjie release channel to download the installation package for your platform’s architecture:
cangjie-sdk-linux-x64-x.y.z.tar.gz: For x86_64 architecture Linux systemscangjie-sdk-linux-aarch64-x.y.z.tar.gz: For aarch64 architecture Linux systemscangjie-sdk-mac-x64-x.y.z.tar.gz: For x86_64 architecture macOS systemscangjie-sdk-mac-aarch64-x.y.z.tar.gz: For aarch64/arm64 architecture macOS systems
Assuming you have selected cangjie-sdk-linux-x64-x.y.z.tar.gz, after downloading it locally, execute the following command to extract it:
tar xvf cangjie-sdk-linux-x64-x.y.z.tar.gz
After extraction, you will see a directory named cangjie in the current working path, which contains all components of the Cangjie toolchain. Execute the following command to complete the installation and configuration:
source cangjie/envsetup.sh
To verify the installation, execute the following command:
cjc -v
Here, cjc is the executable filename of the Cangjie compiler. If the command line displays the Cangjie compiler version information, the toolchain has been successfully installed. Note that the envsetup.sh script only configures the toolchain-related environment variables for the current shell session. To use the toolchain in a new shell session, you must re-execute the envsetup.sh script.
To make the Cangjie toolchain environment variables automatically effective upon shell startup, add the following command to the end of your shell initialization file (e.g., $HOME/.bashrc or $HOME/.zshrc, depending on your shell type):
# Assuming the Cangjie package is extracted to /home/user/cangjie
source /home/user/cangjie/envsetup.sh # The absolute path to envsetup.sh
After configuration, the Cangjie compilation toolchain will be directly available upon shell startup.
Uninstallation and Update
On Linux and macOS platforms, to uninstall the Cangjie toolchain, simply delete the installation directory and remove the environment variables (the simplest way is to open a new shell session):
$ rm -rf <path>/<to>/cangjie
To update the Cangjie toolchain, first uninstall the current version, then follow the above instructions to reinstall the latest version.
Windows
This section uses Windows 10 as an example to introduce the installation method for the Cangjie toolchain.
Installation Guide
On Windows, Cangjie provides two formats of installation packages: exe and zip. Visit the official Cangjie release channel to download the appropriate Windows version for your platform’s architecture.
-
If you choose the
exeformat installation package (e.g.,cangjie-sdk-windows-x64-x.y.z.exe), simply execute the file and follow the installation wizard to complete the installation. -
If you choose the
zipformat installation package (e.g.,cangjie-sdk-windows-x64-x.y.z.zip), extract it to an appropriate directory. The package provides three different installation scripts:envsetup.bat,envsetup.ps1, andenvsetup.sh. Choose one based on your usage habits and environment configuration:-
For Windows Command Prompt (CMD), execute:
path\to\cangjie\envsetup.bat -
For PowerShell, execute:
. path\to\cangjie\envsetup.ps1 -
For MSYS shell, bash, etc., execute:
source path/to/cangjie/envsetup.sh
-
To verify the installation, execute cjc -v in the same command environment. If the Cangjie compiler version information is displayed, the toolchain has been successfully installed.
Important Note:
Similar to Linux, the environment variables configured by the envsetup script are only effective for the current command-line session. To make the Cangjie toolchain automatically available upon command prompt or terminal startup, configure the system as follows:
-
For bash environments, follow these steps:
Add the following command to the end of your
$HOME/.bashrcinitialization file ($HOMEis the path to the current user’s directory):# Assuming the Cangjie package is extracted to /home/user/cangjie source /home/user/cangjie/envsetup.sh # The absolute path to envsetup.shAfter configuration, the Cangjie compilation toolchain will be directly available upon bash startup.
-
For Windows Command Prompt (CMD), PowerShell, or other environments, follow these steps:
-
Search for “View advanced system settings” in the Windows search box and open the corresponding window.
-
Click the “Environment Variables” button.
-
Configure the
CANGJIE_HOMEvariable as follows:-
In the “User variables” (for the current user) or “System variables” (for all users) section, check if the
CANGJIE_HOMEenvironment variable already exists. If not, click “New” and enterCANGJIE_HOMEin the “Variable name” field. If it exists, the environment may already be configured for Cangjie. To overwrite the existing configuration, click “Edit” to enter the “Edit System Variable” window. -
In the “Variable value” field, enter the extraction path of the Cangjie installation package. If a path already exists, overwrite it with the new path. For example, if the package is extracted to
D:\cangjie, enterD:\cangjie. -
After configuration, the “Edit User Variable” or “Edit System Variable” window should display
CANGJIE_HOMEas the variable name andD:\cangjieas the variable value. Click “OK” after confirming the path.
-
-
Configure the
Pathvariable as follows:-
In the “User variables” or “System variables” section, locate and select the
Pathvariable, then click “Edit” to open the “Edit Environment Variable” window. -
Click “New” and enter the following paths one by one:
%CANGJIE_HOME%\bin,%CANGJIE_HOME%\tools\bin,%CANGJIE_HOME%\tools\lib,%CANGJIE_HOME%\runtime\lib\windows_x86_64_cjnative(%CANGJIE_HOME%is the extraction path of the Cangjie package). For example, if the package is extracted toD:\cangjie, the new environment variables should be:D:\cangjie\bin,D:\cangjie\tools\bin,D:\cangjie\tools\lib,D:\cangjie\runtime\lib\windows_x86_64_cjnative. -
(For current user settings only) Click “New” and enter the current user directory path, appending
.cjpm\binto it. For example, if the user path isC:\Users\bob, enterC:\Users\bob\.cjpm\bin. -
After configuration, the “Edit Environment Variable” window should display the paths as follows. Click “OK” after confirming the paths:
D:\cangjie\bin D:\cangjie\tools\bin D:\cangjie\tools\lib D:\cangjie\runtime\lib\windows_x86_64_cjnative C:\Users\bob\.cjpm\bin
-
-
Click “OK” to exit the “Environment Variables” window.
-
Click “OK” to complete the setup.
Note: After configuration, you may need to restart the command-line window or the system for the changes to take effect.
After configuration, the Cangjie compilation toolchain will be directly available upon Windows Command Prompt (CMD) or PowerShell startup.
-
Uninstallation and Update
-
If you installed using the
exeformat package, run theunins000.exeexecutable in the Cangjie installation directory and follow the uninstallation wizard to complete the process. -
If you installed using the
zipformat package, delete the Cangjie toolchain installation directory and remove the above environment variable settings (if any) to complete the uninstallation.
To update the Cangjie toolchain, first uninstall the current version, then follow the above instructions to reinstall the latest version.
Running Your First Cangjie Program
Everything is ready—let’s start writing and running your first Cangjie program!
Compiling with cjc
First, create a new text file named hello.cj in an appropriate directory and write the following Cangjie code into it:
// hello.cj
main() {
println("Hello, Cangjie")
}
In this code snippet, we use Cangjie’s comment syntax. Single-line comments can be written after the // symbol, while multi-line comments can be written between /* and */ symbols, which is identical to the comment syntax in languages like C/C++. Comment content does not affect program compilation and execution.
Next, execute the following command in this directory:
cjc hello.cj -o hello
Here, the Cangjie compiler will compile the source code in hello.cj into an executable file named hello for the current platform. When you run this file in a command-line environment, you’ll see the program output the following content:
Hello, Cangjie
Note:
The above compilation command is for Linux and macOS platforms. If you’re using Windows, simply modify the compilation command to
cjc hello.cj -o hello.exe.
Compiling and Running with cjpm
In addition to using the cjc compiler directly, you can also use the Cangjie Project Manager (cjpm) to quickly create, manage, and run Cangjie projects.
Please follow the steps below to create your first Cangjie project:
-
create a new directory named
hello_cjpmto store the project files, and then enter the directory. -
use the
cjpm initcommand to initialize a new Cangjie module.
cjpm init
After a successful execution, the command line will show cjpm init success. At this point, cjpm generates the default project structure in the current directory:
hello_cjpm
├── cjpm.toml // The configuration file for the project
└── src
└── main.cj // The default source code file
The content of the default source file main.cj is as follows:
// main.cj
package hello_cjpm // Declares that the current source file belongs to the hello_cjpm package
main(): Int64 {
println("hello world")
return 0
}
Alternatively, you can run cjpm init --path hello_cjpm. cjpm will automatically create the hello_cjpm directory and initialize the project inside it.
In the project root directory (where cjpm.toml is located), run the following command to compile and run the program:
cjpm run
cjpm will automatically handle dependency checks, compilation, and the execution of the program. The following output will be shown on the command line:
hello world
cjpm run finished
Identifiers
In the Cangjie programming language, developers can assign names to certain program elements, which are referred to as “identifiers.”
Before learning about identifiers, it is necessary to understand some concepts related to the Unicode character set. In the Unicode standard, the XID_Start and XID_Continue properties are used to mark characters that can serve as the starting and subsequent characters of a Unicode identifier, respectively. For detailed definitions, please refer to the Unicode Standard Documentation. Among them, XID_Start includes characters such as Chinese and English, while XID_Continue includes Chinese, English, Arabic numerals, and more. The Cangjie language uses Unicode Standard 15.0.0.
Identifiers in the Cangjie programming language are divided into two categories: regular identifiers and raw identifiers, each following different naming rules.
Regular identifiers cannot be the same as Cangjie keywords and are derived from the following two types of character sequences:
- A sequence starting with an
XID_Startcharacter, followed by any number ofXID_Continuecharacters. - A sequence starting with an
_, followed by at least oneXID_Continuecharacter.
Cangjie recognizes all identifiers in their Normalization Form C (NFC) form. Two identifiers are considered the same if they are equal after NFC normalization.
For example, each of the following strings is a valid regular identifier:
abc
_abc
abc_
a1b2c3
a_b_c
a1_b2_c3
Cangjie
__こんにちは
Each of the following strings is an invalid regular identifier:
ab&c // & is not an XID_Continue character
3abc // Arabic numerals are not XID_Start characters, so they cannot be used as starting characters
_ // An underscore must be followed by at least one XID_Continue character
while // "while" is a Cangjie keyword and cannot be used as a regular identifier
Raw identifiers are regular identifiers or Cangjie keywords enclosed in a pair of backticks. They are primarily used in scenarios where Cangjie keywords need to be used as identifiers.
For example, each of the following strings is a valid raw identifier:
`abc`
`_abc`
`a1b2c3`
`if`
`while`
`à֮̅̕b`
Each of the following strings is an invalid raw identifier because the content inside the backticks is an invalid regular identifier:
`ab&c`
`3abc`
Program Structure
Typically, developers write Cangjie programs in text files with the .cj extension, which are also referred to as source code and source files. In the final stage of program development, this source code will be compiled into binary files of a specific format.
At the top-level scope of a Cangjie program, a series of variables, functions, and custom types (such as struct, class, enum, and interface) can be defined. Among these, variables and functions are called global variables and global functions, respectively. To compile a Cangjie program into an executable file, a main function must be defined at the top-level scope as the program entry point. This function can either take a parameter of type Array<String> or no parameters at all, and its return type can be an integer type or the Unit type.
Note:
When defining the
mainfunction, thefuncmodifier is not required. Additionally, if command-line arguments are needed during program startup, a parameter of typeArray<String>can be declared and used.
For example, in the following program, the top-level scope defines the global variable a, the global function b, the custom types C, D, and E, as well as the main function serving as the program entry point.
// example.cj
let a = 2023
func b() {}
struct C {}
class D {}
enum E { F | G }
main() {
println(a)
}
In non-top-level scopes, the aforementioned custom types cannot be defined, but variables and functions can be defined, referred to as local variables and local functions. Specifically, variables and functions defined within custom types are called member variables and member functions.
Note:
enumandinterfaceonly support defining member functions and do not allow member variables.
For example, in the following program, the top-level scope defines the global function a and the custom type A. Within the function a, the local variable b and local function c are defined, while within the custom type A, the member variable b and member function c are defined.
// example.cj
func a() {
let b = 2023
func c() {
println(b)
}
c()
}
class A {
let b = 2024
public func c() {
println(b)
}
}
main() {
a()
A().c()
}
Running the above program will output:
2023
2024
Variables
In the Cangjie programming language, a variable consists of a corresponding variable name, data (value), and several attributes. Developers access the data associated with a variable through its name, but such access operations must comply with the constraints of the relevant attributes (such as data type, mutability, and visibility).
The specific form of variable definition is:
modifier variable_name: variable_type = initial_value
Here, modifiers are used to set various attributes of the variable and can be one or more. Commonly used modifiers include:
- Mutability modifiers:
letandvar, corresponding to immutable and mutable attributes, respectively. Mutability determines whether a variable’s value can be changed after initialization, thus dividing Cangjie variables into immutable and mutable types. constmodifier:constis a special variable modifier used to declare constants. It requires initialization at declaration and prohibits any changes to its value afterward. This is similar to theletmodifier in terms of immutability but imposes stricter usage restrictions.- Visibility modifiers:
privateandpublic, among others, which affect the reference scope of global variables and member variables. For details, refer to the relevant sections in subsequent chapters. - Static modifiers:
static, which affect the storage and referencing of member variables. For details, refer to the relevant sections in subsequent chapters.
All variables support the assignment operator (=), regardless of type. Variables modified by let can only be assigned once (i.e., initialized), while those modified by var can be assigned multiple times.
When defining a Cangjie variable, a mutability modifier is mandatory. Additional modifiers can be added as needed.
- Variable name must be a valid Cangjie identifier.
- Variable type specifies the type of data held by the variable. When the initial value has a clear type, the variable type annotation can be omitted, allowing the compiler to infer the type automatically.
- Initial value is a Cangjie expression used to initialize the variable. If the variable type is annotated, the initial value type must match the variable type. Global variables or static member variables must be initialized at definition. Local variables or instance member variables can omit the initial value but must have a type annotation. They must be initialized before being referenced; otherwise, a compilation error will occur.
For example, the following program defines three Int64 variables (the immutable variable a, the mutable variable b, and the const variable c). It then modifies the value of b, assigns b’s value to a, and prints the values of a, b, and c using the println function.
main() {
let a: Int64
var b: Int64 = 14
const c: Int64 = 13
b = 12
a = b // A variable modified by let can only be assigned once, that is, initialized
println("${a}, ${b}, ${c}")
}
Compiling and running this program will output:
12, 12, 13
Attempting to modify an immutable variable will result in a compilation error, for example:
main() {
let pi: Float64 = 3.14159
pi = 2.71828 // Error, cannot assign to immutable value
}
When the initial value has a clear type, the variable type annotation can be omitted, for example:
main() {
let a: Int64 = 2023
let b = a
println("a - b = ${a - b}")
}
Here, the type of variable b can be automatically inferred as Int64 from its initial value a, so this program can be compiled and run normally, outputting:
a - b = 0
When defining local variables, initialization can be omitted, but the variable must be assigned a value before being referenced, for example:
main() {
let text: String
text = "仓颉造字"
println(text)
}
Compiling and running this program will output:
仓颉造字
Global variables and static member variables must be initialized at definition; otherwise, a compilation error will occur, for example:
let global: Int64 // Error, variable in top-level scope must be initialized
main(): Unit{
}
class Player {
static let score: Int32 // Error, static variable 'score' needs to be initialized when declaring
}
Note that when the compiler cannot determine whether certain scenarios will definitely initialize a variable or whether an immutable variable is being reinitialized, it will conservatively report a compilation error, as shown in the following example:
func calc(a: Int32){
println(a)
return a * a
}
main() {
let a: String
if(calc(32) == 0){
a = "1"
}
a = "2" // Error, cannot assign to immutable value
}
Additionally, for try-catch scenarios, the compiler assumes that the try block is always fully executed and always throws an exception, leading to related errors, as shown in the following example:
main() {
let a: String
try {
a = "1"
} catch (_) {
a = "2" // Error, cannot assign to immutable value
}
}
const Variables
const variables are a special type of variable modified by the keyword const. They are evaluated at compile time and cannot be changed during runtime. For example, the following defines the gravitational constant G:
const G = 6.674e-11
const variables can omit type annotations but cannot omit initialization expressions. They can be global variables, local variables, or static member variables. However, const variables cannot be defined in extensions. They can access all instance members of their corresponding types and call all non-mut instance member functions.
The following example defines a struct to record a planet’s mass and radius, along with a const member function gravity to calculate the gravitational force exerted by the planet on an object of mass m at distance r:
struct Planet {
const Planet(let mass: Float64, let radius: Float64) {}
const func gravity(m: Float64, r: Float64) {
G * mass * m / r**2
}
}
main() {
const myMass = 71.0
const earth = Planet(5.972e24, 6.378e6)
println(earth.gravity(myMass, earth.radius))
}
Compiling and executing this will output the gravitational force exerted by Earth on a 71 kg adult standing on its surface:
695.657257
After initialization, all members of a const variable’s type instance are const (deep const, including members of members) and thus cannot be used as lvalues.
main() {
const myMass = 71.0
myMass = 70.0 // Error, cannot assign to immutable value
}
Value-Type and Reference-Type Variables
From the compiler’s implementation perspective, any variable is always associated with a value (typically via a memory address/register). However, for some variables, the value itself is directly used, which are called value-type variables. For others, the value serves as an index to access the data it points to, which are called reference-type variables. Value-type variables are usually allocated on the thread stack, with each variable having its own data copy. Reference-type variables are usually allocated on the process heap, with multiple variables potentially referencing the same data object. Operations on one variable may affect others.
From the language perspective, value-type variables exclusively bind to their data/storage space, while reference-type variables share their data/storage space with other reference-type variables.
Based on these principles, there are behavioral differences between value-type and reference-type variables, with the following points worth noting:
- Assigning to a value-type variable typically involves a copy operation, and the originally bound data/storage space is overwritten. Assigning to a reference-type variable only changes the reference relationship, leaving the originally bound data/storage space unaffected.
- Variables defined with
letcannot be reassigned after initialization. For reference types, this only restricts the reference relationship from changing; the referenced data can still be modified.
In the Cangjie programming language, class and Array types are reference types, while other basic data types and struct types are value types.
For example, the following program demonstrates the behavioral differences between struct and class type variables:
struct Copy {
var data = 2012
}
class Share {
var data = 2012
}
main() {
let c1 = Copy()
var c2 = c1
c2.data = 2023
println("${c1.data}, ${c2.data}")
let s1 = Share()
let s2 = s1
s2.data = 2023
println("${s1.data}, ${s2.data}")
}
Running the above program will output:
2012, 2023
2023, 2023
From this, we can observe that for value-type Copy variables, assignment always obtains a copy of the Copy instance, such as c2 = c1. Subsequent modifications to c2 members do not affect c1. For reference-type Share variables, assignment establishes a reference relationship between the variable and the instance, such as s2 = s1. Subsequent modifications to s2 members will affect s1.
If we change var c2 = c1 to let c2 = c1 in the above program, the compilation will report an error, for example:
struct Copy {
var data = 2012
}
main() {
let c1 = Copy()
let c2 = c1
c2.data = 2023 // Error, cannot assign to immutable value
}
Scope
Earlier, we briefly introduced how to name elements in Cangjie programs. In practice, besides variables, names can also be assigned to functions and custom types, and these names are used to access the corresponding program elements.
However, in practical applications, some special cases need to be considered:
- When the program scale is large, those short names are prone to duplication, leading to naming conflicts.
- Considering runtime scenarios, in some code segments, certain program elements are invalid, and referencing them will cause runtime errors. For example, some variables become invalid after their scope is exited.
- In certain logical constructs, to express containment relationships between elements, sub-elements should not be accessed directly by name but through their parent element names indirectly.
To address these issues, modern programming languages introduce the concept and design of “scope,” limiting the binding relationship between names and program elements to a specific range. Scopes can be parallel, unrelated, nested, or contain each other. A scope clearly defines which program elements can be accessed by which names, with the following specific rules:
- The binding relationship between program elements and names defined in the current scope is valid within the current scope and its inner scopes, allowing direct access to the corresponding program elements via these names.
- The binding relationship between program elements and names defined in an inner scope is invalid in outer scopes.
- Inner scopes can redefine binding relationships using names from outer scopes. According to rule 1, the naming in the inner scope effectively shadows the same-name definition in the outer scope. In this case, the inner scope is said to have a higher level than the outer scope.
In the Cangjie programming language, a pair of curly braces {} enclosing a segment of Cangjie code creates a new scope. Within this scope, further curly braces {} can enclose more Cangjie code, resulting in nested scopes. These scopes all adhere to the above rules. Specifically, in a Cangjie source file, code not enclosed by any curly braces {} belongs to the “top-level scope,” the “outermost” scope in the current file, which, according to the above rules, has the lowest scope level.
Note:
Cangjie does not allow standalone curly braces
{}. Curly braces must depend on other syntactic structures such asif,match, function bodies, class bodies, or struct bodies.
For example, in the following Cangjie source file named test.cj, the name element is defined in the top-level scope, bound to the string “Cangjie.” Within the main and if blocks, the name element is also defined, corresponding to the integer 9 and the integer 2023, respectively. According to the scope rules, at line 4, element has the value “Cangjie”; at line 8, element has the value 2023; and at line 10, element has the value 9.
// test.cj
let element = "Cangjie"
main() {
println(element)
let element = 9
if (element > 0) {
let element = 2023
println(element)
}
println(element)
}
Running the above program will output:
Cangjie
2023
9
Expressions
In some traditional programming languages, an expression consists of one or more operands combined by zero or more operators. An expression always implies a computation process, so each expression will have a computation result. For expressions containing only operands without operators, the computation result is the operand itself. For expressions containing operators, the computation result is the value obtained by performing the operations defined by the operators on the operands. Expressions defined in this way are also called arithmetic expressions. For operator precedence, refer to the Operators chapter.
In the Cangjie programming language, the traditional definition of expressions is simplified and extended—any language element that can be evaluated is considered an expression. Therefore, Cangjie not only has traditional arithmetic expressions but also conditional expressions, loop expressions, and try expressions, all of which can be evaluated and used as values, such as initial values for variable definitions and function arguments. Additionally, because Cangjie is a strongly typed programming language, Cangjie expressions not only can be evaluated but also have a definite type.
Note:
To clearly distinguish between different program statements or expressions, Cangjie uses semicolons (
;) as separators. If a statement occupies a line by itself, the semicolon can be omitted. However, if multiple statements exist on the same line, they must be separated by semicolons.
Various expressions in the Cangjie programming language will be introduced in subsequent chapters. This section covers the most commonly used conditional expressions, loop expressions, and some control transfer expressions (break, continue).
The execution flow of any program involves only three basic structures—sequential structure, branching structure, and looping structure. In fact, branching and looping structures are obtained by certain instructions causing jumps in the current sequential execution flow, enabling the program to express more complex logic. In Cangjie, the language elements used to control the execution flow are conditional expressions and loop expressions.
In the Cangjie programming language, the conditional expression is the if expression, whose value and type depend on the usage context. There are three types of loop expressions: for-in expressions, while expressions, and do-while expressions, all of which have the type Unit and the value ().
In Cangjie programs, a group of expressions enclosed by a pair of curly braces {} is called a “code block,” which serves as a sequential execution flow in the program. The expressions within the block are executed in the order they are written. If a code block contains at least one expression, the value and type of the block are defined to be equal to the value and type of its last expression. If the code block contains no expressions, such an empty block is defined to have the type Unit and the value ().
Note:
A code block itself is not an expression and cannot be used alone. It must be attached to functions, conditional expressions, loop expressions, etc., for execution and evaluation.
if Expression
The basic form of the if expression is:
if (condition) {
branch1
} else {
branch2
}
Here, “condition” can be a Boolean-type expression, a “let pattern” (syntactic sugar), or multiple “let patterns” and Boolean-type expressions connected directly by logical AND or OR operations. For examples involving “let patterns,” refer to Examples of “Conditions” Involving let Patterns.
When the expression and pattern match successfully, the pattern match evaluates to true, and the if branch’s corresponding code block is executed. Otherwise, it evaluates to false, and the else branch’s code block is executed. The else branch is optional.
“Branch1” and “branch2” are two code blocks. The if expression executes according to the following rules:
- Evaluate the “condition” expression. If the value is
true, proceed to step 2; iffalse, proceed to step 3. - Execute “branch1,” then proceed to step 4.
- Execute “branch2,” then proceed to step 4.
- Continue executing the code following the
ifexpression.
In some scenarios, only the actions to take when the condition is true may be of interest, so the else and its corresponding code block can be omitted.
The following program demonstrates the basic usage of the if expression:
import std.random.Random
main() {
let number: Int8 = Random().nextInt8()
println(number)
if (number % 2 == 0) {
println("Even")
} else {
println("Odd")
}
}
In this program, a random integer is generated using the random package from the Cangjie standard library. The if expression checks whether this integer is divisible by 2 and prints “Even” or “Odd” in the respective branches.
The Cangjie programming language is strongly typed. The condition in an if expression must be of Boolean type; integers, floating-point numbers, etc., cannot be used. Unlike C and similar languages, Cangjie does not use whether the condition evaluates to zero as the basis for branching. For example, the following program will result in a compilation error (additional incorrect expression examples are provided in Incorrect Expression Examples for comparison):
main() {
let number = 1
if (number) { // Compilation error: type mismatch
println("Non-zero")
}
}
In many scenarios, when one condition fails, another or multiple conditions may need to be checked before executing corresponding actions. Cangjie allows new if expressions to follow else, enabling multi-level conditional checks and branching. For example:
import std.random.Random
main() {
let speed = Random().nextFloat64() * 20.0
println("${speed} km/s")
if (speed > 16.7) {
println("Third cosmic velocity: Magpie Bridge Rendezvous")
} else if (speed > 11.2) {
println("Second cosmic velocity: Chang'e Flies to the Moon")
} else if (speed > 7.9) {
println("First cosmic velocity: Soaring Through Clouds")
} else {
println("Stay grounded, gaze at the stars")
}
}
The value and type of an if expression depend on its usage form and context:
-
When an
ifexpression with anelsebranch is evaluated, its type is determined based on the evaluation context:-
If the context explicitly requires a value of type
T, the types of all branch code blocks in theifexpression must be subtypes ofT. Theifexpression’s type is then determined asT. If the subtype constraint is not satisfied, a compilation error occurs. For example, in the following code, since the typeInt64of variablebdoes not satisfy the subtype constraint with the types of the branch code blocks, a compilation error occurs:var a = 10 var b: Int64 = if(a == 10) { // Error, mismatched types "this is 10" }else { "this is not 10" } -
If the context does not have explicit type requirements, the
ifexpression’s type is the least common supertype of all branch code block types. If no least common supertype exists, a compilation error occurs. For example, in the following code, since string and numeric types have no least common supertype, a compilation error occurs:var a = 10 var b = if(a == 10) { // Error, types Struct-String and Int64 of the two branches of this 'if' expression mismatch "this is 10" }else { 20 }
If compilation succeeds, the
ifexpression’s value is the value of the executed branch’s code block. -
-
If an
ifexpression with anelsebranch is not evaluated, in such scenarios, developers typically only want to perform different operations in different branches without concerning themselves with the values and types of the last expressions in each branch. To avoid the above type-checking rules affecting this mental model, Cangjie stipulates that in such scenarios, theifexpression’s type isUnit, its value is(), and the branches do not participate in the above type checks. -
For
ifexpressions without anelsebranch, since theifbranch may not be executed, suchifexpressions are defined to have the typeUnitand the value().
For example, the following program uses an if expression evaluation to simulate a simple analog-to-digital conversion process:
main() {
let zero: Int8 = 0
let one: Int8 = 1
let voltage = 5.0
let bit = if (voltage < 2.5) {
zero
} else {
one
}
}
In the above program, the if expression is used as the initial value for a variable definition. Since the variable bit is not explicitly typed and its type must be inferred from the initial value, the if expression’s type is determined as the least common supertype of the two branch code block types. As explained earlier regarding “code blocks,” both branch code block types are Int8, so the if expression’s type is determined as Int8, and its value is the value of the executed branch (the else branch’s code block). Thus, the variable bit has the type Int8 and the value 1.
Examples of “Conditions” Involving let Patterns
“let patterns” are syntactic sugar. A “let pattern” has the form let pattern <- expression, where:
pattern: A pattern used to match the value type and content ofexpression.<-: The pattern-matching operator.expression: An expression whose value is evaluated and then matched against the pattern. The precedence of theexpressionmust not be lower than the..operator, but parentheses can be used to change precedence. For operator precedence, refer to Operators.
Here are examples of “conditions” involving logical AND or OR operations between two “let patterns” or between a “let pattern” and other expressions.
main() {
let a = Some(3)
let c = if (let Some(b) <- a) {
1 // Pattern match succeeds, c = 1
} else {
2
}
let d = Some(1)
if (let Some(e) <- a && let Some(f) <- d) { // Both patterns match; condition evaluates to true
println("${e} ${f}") // prints 3 1
}
if (let Some(f) <- d && f > 3) { // Pattern matches; f = 1, f > 3 check fails; jumps to else branch
println("${f}")
} else {
println("d is None or value of d is less or equal to 3") // prints this line
}
if (let Some(_) <- a || let Some(_) <- d) { // Enum patterns connected by ||, no variable binding; correct
println("at least one of a and d is Some") // prints this line
} else {
println("both a and d are None")
}
let g = 3
if (let Some(_) <- a || g > 1) {
println("this") // prints this line
} else {
println("that")
}
}
In “let patterns,” the precedence of the expression part must not be lower than the .. operator. Here are corresponding incorrect and correct examples. For details on the Option type, refer to later sections.
if (let Some(a) <- fun() as Option<Int64>) {} // Parsing error: `as` has lower precedence than `..`
if (let Some(a) <- (fun() as Option<Int64>)) {} // Correct
if (let Some(a) <- b && a + b > 3) {} // Correct, parsed as (let Some(a) <- b) && (a + b > 3)
if (let m <- 0..generateSomeInt()) {} // Correct
Incorrect Expression Examples
Here are examples of incorrect “conditions.”
if (let Some(a) <- b || a > 1) {} // Conditions connected by `||` cannot use enum patterns that bind variables
if (let Some(a) <- b && a + 1) {} // Right side of `&&` is neither a let pattern nor a Boolean-type expression
if (a > 3 && let Some(a) <- b) {} // a is bound by Some(a) pattern; cannot use it on the left side of the binding pattern
if (let Some(a) <- b && a > 3) {
println("${a} > 3")
} else {
println("${a} < 3") // a can only be used in the if branch, not the else branch
}
if (let Some(a) <- b where a > 3) {} // Use `&&` for condition checks, not `where`
while Expression
The basic form of the while expression is:
while (condition) {
loop_body
}
Here, “condition” is the same as in the if expression, and “loop_body” is a code block. The while expression executes according to the following rules:
- Evaluate the “condition” expression. If the value is
true, proceed to step 2; iffalse, proceed to step 3. - Execute “loop_body,” then proceed to step 1.
- End the loop and continue executing the code following the
whileexpression.
For example, the following program uses a while expression to approximate the square root of 2 using the bisection method:
main() {
var root = 0.0
var min = 1.0
var max = 2.0
var error = 1.0
let tolerance = 0.1 ** 10
while (error ** 2 > tolerance) {
root = (min + max) / 2.0
error = root ** 2 - 2.0
if (error > 0.0) {
max = root
} else {
min = root
}
}
println("Square root of 2 ≈ ${root}")
}
Running this program will output:
Square root of 2 ≈ 1.414215
do-while Expression
The basic form of the do-while expression is:
do {
loop_body
} while (condition)
Here, “condition” is a Boolean-type expression, and “loop_body” is a code block. The do-while expression executes according to the following rules:
- Execute “loop_body,” then proceed to step 2.
- Evaluate the “condition” expression. If the value is
true, proceed to step 1; iffalse, proceed to step 3. - End the loop and continue executing the code following the
do-whileexpression.
For example, the following program uses a do-while expression to approximate the value of π using the Monte Carlo method:
import std.random.Random
main() {
let random = Random()
var totalPoints = 0
var hitPoints = 0
do {
// Randomly sample points within the square ((0, 0), (1, 1))
let x = random.nextFloat64()
let y = random.nextFloat64()
// Determine if the point falls within the inscribed circle
if ((x - 0.5) ** 2 + (y - 0.5) ** 2 < 0.25) {
hitPoints++
}
totalPoints++
} while (totalPoints < 1000000)
let pi = 4.0 * Float64(hitPoints) / Float64(totalPoints)
println("Approximate value of pi: ${pi}")
}
Running the above program will output:
Approximate value of pi: 3.141872
Note:
Since the algorithm involves random numbers, the output may vary each time the program is run, but it will always approximate 3.14.
for-in Expression
The for-in expression can iterate over instances of types that implement the iterator interface Iterable<T>. The basic form of the for-in expression is:
for (iterationVariable in sequence) {
loopBody
}
Here, “loopBody” is a code block. The “iterationVariable” is a single identifier or a tuple composed of multiple identifiers, used to bind the data pointed to by the iterator in each iteration. It can be used as a local variable within the “loopBody”. The “sequence” is an expression that is evaluated only once, and the iteration is performed on the value of this expression. Its type must implement the iterator interface Iterable<T>. The for-in expression executes according to the following rules:
- Evaluate the “sequence” expression, use its value as the iteration target, and initialize the iterator of the iteration target.
- Update the iterator. If the iterator terminates, proceed to step 4; otherwise, proceed to step 3.
- Bind the current data pointed to by the iterator to the “iterationVariable” and execute the “loopBody”, then return to step 2.
- Terminate the loop and continue executing the code following the
for-inexpression.
Note:
Built-in types such as ranges and arrays in Cangjie already implement the
Iterable<T>interface.
For example, the following program uses a for-in expression to iterate over the array noumenonArray composed of Chinese earthly branch characters, outputting the Heavenly Stems and Earthly Branches for each month of the lunar year 2024:
main() {
let metaArray = [r'甲', r'乙', r'丙', r'丁', r'戊', r'己', r'庚', r'辛', r'壬', r'癸']
let noumenonArray = [r'寅', r'卯', r'辰', r'巳', r'午', r'未', r'申', r'酉', r'戌', r'亥', r'子', r'丑']
let year = 2024
// Heavenly stem index corresponding to the year
let metaOfYear = ((year % 10) + 10 - 4) % 10
// Heavenly stem index for the first month of this year
var index = (2 * metaOfYear + 3) % 10 - 1
println("Heavenly Stems and Earthly Branches for Lunar Year 2024:")
for (noumenon in noumenonArray) {
print("${metaArray[index]}${noumenon} ")
index = (index + 1) % 10
}
}
Here, characters prefixed with r represent character literals. Running the above program will output:
Heavenly Stems and Earthly Branches for Lunar Year 2024:
丙寅 丁卯 戊辰 己巳 庚午 辛未 壬申 癸酉 甲戌 乙亥 丙子 丁丑
Iterating Over Ranges
The for-in expression can iterate over range type instances, for example:
main() {
var sum = 0
for (i in 1..=100) {
sum += i
}
println(sum)
}
Running the above program will output:
5050
For detailed information about range types, refer to the basic data type Range Type.
Iterating Over Sequences of Tuples
If the elements of a sequence are of tuple type, the “iterationVariable” in the for-in expression can be written as a tuple to destructure the sequence elements, for example:
main() {
let array = [(1, 2), (3, 4), (5, 6)]
for ((x, y) in array) {
println("${x}, ${y}")
}
}
Running the above program will output:
1, 2
3, 4
5, 6
Iteration Variables Cannot Be Modified
Within the loop body of a for-in expression, the iteration variable cannot be modified. For example, the following program will result in a compilation error:
main() {
for (i in 0..5) {
i = i * 10 // Error: Cannot assign to an initialized `let` constant
println(i)
}
}
Using Wildcard _ as Iteration Variable
In some scenarios, you only need to perform certain operations in a loop without using the iteration variable. In such cases, you can use the wildcard _ in place of the iteration variable, for example:
main() {
var number = 2
for (_ in 0..5) {
number *= number
}
println(number)
}
Running the above program will output:
4294967296
Note:
In such scenarios, if a regular identifier is used to define the iteration variable, the compilation will output an “unused variable” warning. Using the wildcard
_avoids this warning.
where Condition
In some iteration scenarios, you may need to skip specific values of the iteration variable and proceed to the next iteration. While this can be achieved using if and continue expressions within the loop body, Cangjie provides a more concise way—you can use the where keyword followed by a Boolean expression after the “sequence”. Before executing the loop body each time, this expression will be evaluated. If the value is true, the loop body will execute; otherwise, it will proceed directly to the next iteration. For example:
main() {
for (i in 0..8 where i % 2 == 1) { // Loop body executes only when i is odd
println(i)
}
}
Running the above program will output:
1
3
5
7
break and continue Expressions
In loop structures, sometimes you need to terminate the loop early or skip the current iteration based on specific conditions. To facilitate this, Cangjie introduces the break and continue expressions. They can appear within the loop body of a loop expression. break terminates the current loop expression and proceeds to execute the code following the loop expression, while continue skips the remaining part of the current iteration and proceeds to the next iteration. Both break and continue expressions are of type Nothing.
For example, the following program uses a for-in expression and a break expression to find the first number divisible by 5 in a given integer array:
main() {
let numbers = [12, 18, 25, 36, 49, 55]
for (number in numbers) {
if (number % 5 == 0) {
println(number)
break
}
}
}
When the for-in iteration reaches the third number 25 in the numbers array, since 25 is divisible by 5, the if branch containing println and break will execute. The break will terminate the for-in loop, and subsequent numbers in numbers will not be traversed. Therefore, running the above program will output:
25
The following program uses a for-in expression and a continue expression to print the odd numbers in a given integer array:
main() {
let numbers = [12, 18, 25, 36, 49, 55]
for (number in numbers) {
if (number % 2 == 0) {
continue
}
println(number)
}
}
During iteration, when number is even, continue will be executed, skipping the remaining part of the current iteration (including println) and proceeding to the next iteration. Therefore, running the above program will output:
25
49
55
Functions
In Cangjie, the keyword func is used to denote the start of a function definition. Following func are the function name, parameter list, optional return type, and function body. The function name can be any valid identifier. The parameter list is enclosed in parentheses (with multiple parameters separated by commas), a colon separates the parameter list and the return type (if present), and the function body is enclosed in curly braces.
Example of a function definition:
func add(a: Int64, b: Int64): Int64 {
return a + b
}
In the above example, a function named add is defined. Its parameter list consists of two Int64 parameters, a and b, and its return type is Int64. The function body adds a and b together and returns the result.
For more details, refer to the Defining Functions module.
Basic Operators
Operators are symbols that perform specific mathematical or logical operations. For example, the mathematical operator plus (+) can add two numbers (e.g., let i = 1 + 2), and the logical operator AND (&&) can be used to combine and ensure multiple conditional judgments are satisfied (e.g., if (i > 0 && i < 10)).
The Cangjie programming language not only supports various commonly used operators but also improves some of them to reduce common coding errors. For instance, the type of an assignment expression (an expression containing an assignment operator) is Unit, and its value is (). If you write if(a = 3) instead of if(a == 3), the return value of the assignment expression is not of Boolean type, resulting in a compilation error. This helps avoid the issue of mistakenly using the assignment operator (=) instead of the equality operator (==). The results of arithmetic operators (+, -, *, /, %, etc.) are checked to prevent value overflow, thereby avoiding abnormal results when saving variables that exceed their type’s capacity.
The Cangjie programming language also provides range operators, such as a..b or a..=b, which conveniently express a range of values.
This chapter only describes the basic operators in the Cangjie programming language. For other operators, refer to the Operators section in the appendix. For information on how to overload operators for custom types, see the Operator Overloading chapter.
Assignment Operator
Used to modify the value of the left operand to the value of the right operand, requiring that the type of the right operand be a subtype of the left operand’s type. When evaluating an assignment expression, the expression on the right side of = is always evaluated first, followed by the expression on the left side of =, and finally, the assignment is performed.
main(): Int64 {
var a = 1
var b = 1
a = (b = 0) // Compilation error: the type of the assignment expression is Unit, and its value is ()
if (a = 5) { // Compilation error: the type of the assignment expression is Unit, and its value is ()
}
a = b = 0 // Syntax error: chained use of assignments is not supported
return 0
}
A multiple assignment expression is a special type of assignment expression. The left side of the equals sign in a multiple assignment expression must be a tuple (Tuple), and the elements in this tuple must all be lvalues. The right side of the expression must also be of tuple type, and each element in the right tuple must be a subtype of the corresponding lvalue’s type on the left. Notably, when _ appears in the left tuple, it indicates that the evaluation result of the corresponding position in the right tuple should be ignored (meaning the type check for this position will always pass). A multiple assignment expression can assign the values of the right tuple to the corresponding lvalues in the left tuple in one go, eliminating the need for individual assignments.
main(): Int64 {
var a: Int64
var b: Int64
(a, b) = (1, 2) // a = 1, b = 2
(a, b) = (b, a) // Swap: a = 2, b = 1
(a, _) = (3, 4) // a = 3
(_, _) = (5, 6) // No assignment
return 0
}
Arithmetic Operators
The Cangjie programming language supports the following arithmetic operators: unary minus (-), addition (+), subtraction (-), multiplication (*), division (/), remainder (%), and exponentiation (**). Except for the unary minus, which is a unary prefix operator, all other operators are binary infix operators.
The operand of the unary minus (-) can only be a numeric-type expression. The value of a unary prefix minus expression is equal to the negative of its operand, and its type is the same as the operand’s type:
let num1: Int64 = 8
let num2 = -num1 // num2 = -8, its data type is "Int64".
let num3 = -(-num1) // num3 = 8, its data type is "Int64".
For the binary operators *, /, %, +, and -, the types of the two operands must be the same. The % operator only supports integer operands; *, /, +, and - can operate on any numeric type.
Note:
- When the operands of division (
/) are integers, non-integer values are rounded toward 0 to become integers.- The value of the integer remainder operation
a % bis defined asa - b * (a / b).- The addition operator can also be used for string concatenation.
let a = 2 + 3 // a = 5
let b = 3 - 1 // b = 2
let c = 3 * 4 // c = 12
let d = 7 / 3 // d = 2
let e = 7 / -3 // e = -2, when encountering "-", it has higher precedence.
let f = -7 / 3 // f = -2
let g = -7 / -3 // g = 2, when encountering "-", it has higher precedence.
let h = 4 % 3 // h = 1
let i = 4 % -3 // i = 1, when encountering "-", it has higher precedence.
let j = -4 % 3 // j = -1
let k = -4 % -3 // k = -1, when encountering "-", it has higher precedence.
let s1 = "abc"
var s2 = "ABC"
let r1 = s1 + s2 // r1 = "abcABC"
** represents exponentiation (e.g., x**y calculates the base x raised to the power of y). The left operand of ** can only be of type Int64 or Float64.
Note:
When the left operand is of type Int64, the right operand can only be of type UInt64, and the expression’s type is Int64. When the left operand is of type Float64, the right operand can be of type Int64 or Float64, and the expression’s type is Float64.
let p1 = 2 ** 3 // p1 = 8
let p2 = 2 ** UInt64(3 ** 2) // p2 = 512
let p3 = 2.0 ** 3 // p3 = 8.0
let p4 = 2.0 ** 3 ** 2 // p4 = 512.0
let p5 = 2.0 ** 3.0 // p5 = 8.0
let p6 = 2.0 ** 3.0 ** 2.0 // p6 = 512.0
Compound Assignment Operators
The Cangjie programming language also provides compound assignment operators: **=, *=, /=, %=, +=, -=, <<=, >>=, &=, ^=, |=, &&=, and ||=. Simple examples are as follows:
var a: Int64 = 10
a += 2 // a = 12
a -= 2 // a = 10
a **= 2 // a = 100
a *= 2 // a = 200
a /= 10 // a = 20
a %= 6 // a = 2
a <<= 2 // a = 8
When evaluating a compound assignment expression, the lvalue of the left expression is always evaluated first, then the rvalue is taken from this lvalue, and this rvalue is computed with the right expression (short-circuit rules are followed if applicable), and finally, the assignment is performed. Since a compound assignment expression is also an assignment expression, compound assignment operators are non-associative. Compound assignment expressions also require the two operands to be of the same type.
func foo(p: Point): Point {
p.x += 10
return p
}
open class Point {
var x: Int64 = 0
public init (a: Int64) {
x = a
}
}
main() {
var a = Point(9) // a.x == 9
var b = 2
foo(a).x += (b + b) // a.x == 23
println(a.x)
}
The above example demonstrates the evaluation order of compound assignment expressions. First, the value of foo(a).x is evaluated, resulting in a.x being 19; then, the value of b + b is computed and added to a.x.
Relational Operators
Relational operators include six types: equality (==), inequality (!=), less than (<), less than or equal to (<=), greater than (>), and greater than or equal to (>=). Relational operators are all binary operators and require the two operands to be of the same type. The type of a relational expression is Bool, meaning its value can only be true or false.
Examples of relational expressions:
main(): Int64 {
3 < 4 // true
3 <= 3 // true
3 > 4 // false
3 >= 3 // true
3.14 == 3.15 // false
3.14 != 3.15 // true
return 0
}
For tuple types, a tuple type supports equality (==) and inequality (!=) operations only if all its elements support these operations; otherwise, the tuple type does not support == and != (using them will result in a compilation error). Two instances of the same tuple type are equal if and only if all elements at the same positions (indices) are equal (meaning their lengths are equal).
var isTrue: Bool = (1, 3) == (0, 2) // false
isTrue = (1, "123") == (1.0, 2) // Compilation error: the types of the two operands are inconsistent
isTrue = (1, _) == (1.0, _) // Compilation error: wildcards cannot be used as tuple elements for matching
Coalescing Operator
The coalescing operator is denoted by ??, which is a binary infix operator. The coalescing operator is used for destructuring Option types.
For the expression e1 ?? e2, when the value of e1 is Option<T>.Some(v), the value of e1 ?? e2 is equal to the value of v (in this case, e2 is not evaluated, satisfying “short-circuit evaluation”); when the value of e1 is Option<T>.None, the value of e1 ?? e2 is equal to the value of e2.
Examples of coalescing expressions:
main(): Int64 {
let v1 = Option<Int64>.Some(100)
let v2 = Option<Int64>.None
let r1 = v1 ?? 0
let r2 = v2 ?? 0
print("${r1}") // 100
print("${r2}") // 0
return 0
}
Range Operators
There are two types of range operators: .. and ..=, used to create “left-closed right-open” and “left-closed right-closed” range instances, respectively. For more details, refer to the Range Type.
Logical Operators
The Cangjie programming language supports three logical operators: logical NOT (!), logical AND (&&), and logical OR (||).
Logical NOT (!) is a unary operator that negates the Boolean value of its operand: !false equals true, and !true equals false.
var a: Bool = true // a = true
var b: Bool = !a // b = false
var c: Bool = !false // c = true
Logical AND (&&) and logical OR (||) are both binary operators. For the expression expr1 && expr2, its value is true only if both expr1 and expr2 are true; for the expression expr1 || expr2, its value is false only if both expr1 and expr2 are false.
var a: Bool = true && true // a = true
var b: Bool = true && false // b = false
var c: Bool = false && false // c = false
var d: Bool = false && true // d = false
a = true || true // a = true
b = true || false // b = true
c = false || false // c = false
d = false || true // d = true
Logical AND (&&) and logical OR (||) use short-circuit evaluation: when evaluating expr1 && expr2, if expr1=false, expr2 is not evaluated, and the entire expression’s value is false; when evaluating expr1 || expr2, if expr1=true, expr2 is not evaluated, and the entire expression’s value is true.
func isEven(a: Int64): Bool {
if((a % 2) == 0) {
println("${a} is an even number")
true
} else {
println("${a} is not an even number")
false
}
}
main() {
var a: Bool = isEven(2) && isEven(20)
var b: Bool = isEven(3) && isEven(30) // isEven(3) returns false, b is false, isEven(30) is not evaluated
a = isEven(4) || isEven(40) // isEven(4) returns true, a is true, isEven(40) is not evaluated
b = isEven(5) || isEven(50)
}
Bitwise Operators
The Cangjie programming language supports one unary prefix bitwise operator: bitwise NOT (!), and five binary infix bitwise operators: left shift (<<), right shift (>>), bitwise AND (&), bitwise XOR (^), and bitwise OR (|). The operands of bitwise operators can only be integer types. Bitwise operations are performed by treating operands as binary sequences and applying logical operations (0 as false, 1 as true) or shift operations on each bit.
For shift operators, the operands must be integer types (but the two operands can be different integer types, e.g., the left operand is Int8, and the right operand is Int16). Additionally, the right operand cannot be negative for both left and right shifts (such errors detected at compile time will result in compilation errors; if they occur at runtime, an exception is thrown). For unsigned numbers, the shift and padding rules are: left shifts pad the low bits with 0 and discard the high bits, while right shifts pad the high bits with 0 and discard the low bits. For signed numbers, the shift and padding rules are:
- Positive numbers follow the same padding rules as unsigned numbers;
- Negative numbers pad the low bits with 0 for left shifts and discard the high bits;
- Negative numbers pad the high bits with 1 for right shifts and discard the low bits.
Moreover, if the number of bits to shift (right operand) is equal to or exceeds the operand’s width, it is considered a shift overflow. If detected at compile time, it results in an error; otherwise, a runtime exception is thrown.
var a = !10 // -11, conforms to shift and padding rules
a = !20 // -21, conforms to shift and padding rules
a = 10 << 1 // 20, conforms to shift and padding rules
// a = 1000 << -1 // Compilation error: shift operation overflow (right operand cannot be negative)
// a = 1000 << 100000000000 // Compilation error: shift operation overflow (shift out of bounds)
a = 10 << 1 << 1 // 40, conforms to shift and padding rules
a = 10 >> 1 // 5, conforms to shift and padding rules
a = 10 & 15 // 10
a = 10 ^ 15 // 5
a = 10 | 15 // 15
a = (1 ^ (8 & 15)) | 24 // 25
Increment and Decrement Operators
The increment (++) and decrement (--) operators perform operations to increase or decrease a value by 1 and can only be used as postfix operators. The increment (++) and decrement (--) operators are non-associative.
For the expression expr++ (or expr--), the following rules apply:
- The type of
exprmust be an integer type; - Since
expr++(orexpr--) is syntactic sugar forexpr += 1(orexpr -= 1),exprmust also be assignable; - The type of
expr++(orexpr--) is Unit.
Examples of increment and decrement expressions:
var i: Int32 = 5
i++ // i = 6
i-- // i = 5
i--++ // Syntax error
var j = 0
j = i-- // Semantic error
Integer Types
Integer types are divided into signed integer types and unsigned integer types.
Signed integer types include Int8, Int16, Int32, Int64, and IntNative, which are used to represent signed integer values with encoding lengths of 8-bit, 16-bit, 32-bit, 64-bit, and platform-dependent sizes, respectively.
Unsigned integer types include UInt8, UInt16, UInt32, UInt64, and UIntNative, which are used to represent unsigned integer values with encoding lengths of 8-bit, 16-bit, 32-bit, 64-bit, and platform-dependent sizes, respectively.
For a signed integer type with an encoding length of N, its representable range is: $-2^{N-1} \sim 2^{N-1}-1$; for an unsigned integer type with an encoding length of N, its representable range is: $0 \sim 2^{N}-1$. The following table lists the representable ranges of all integer types:
| Type | Representable Range |
|---|---|
| Int8 | $-2^7 \sim 2^7-1 (-128 \sim 127)$ |
| Int16 | $-2^{15} \sim 2^{15}-1 (-32,768 \sim 32,767)$ |
| Int32 | $-2^{31} \sim 2^{31}-1 (-2,147,483,648 \sim 2,147,483,647)$ |
| Int64 | $-2^{63} \sim 2^{63}-1 (-9,223,372,036,854,775,808 \sim 9,223,372,036,854,775,807)$ |
| IntNative | platform dependent |
| UInt8 | $0 \sim 2^8-1 (0 \sim 255)$ |
| UInt16 | $0 \sim 2^{16}-1 (0 \sim 65,535)$ |
| UInt32 | $0 \sim 2^{32}-1 (0 \sim 4,294,967,295)$ |
| UInt64 | $0 \sim 2^{64}-1 (0 \sim 18,446,744,073,709,551,615)$ |
| UIntNative | platform dependent |
The choice of which integer type to use in a program depends on the nature and range of the integers to be processed. When Int64 is suitable, it is preferred because its representable range is sufficiently large, and integer literals default to Int64 type in the absence of type context, avoiding unnecessary type conversions. Additionally, the Cangjie programming language provides IntNative and UIntNative as signed and unsigned integer types, respectively, with bit widths consistent with the current system. This means their sizes depend on the platform they run on, allowing them to automatically adapt to the system’s bit width in cross-platform development.
Integer Literals
Integer literals can be represented in 4 radix forms: binary (with 0b or 0B prefix), octal (with 0o or 0O prefix), decimal (no prefix), and hexadecimal (with 0x or 0X prefix). For example, the decimal number 24 can be represented as 0b00011000 (or 0B00011000) in binary, 0o30 (or 0O30) in octal, and 0x18 (or 0X18) in hexadecimal.
In any radix representation, underscores _ can be used as separators to improve readability, such as 0b0001_1000.
If the value of an integer literal exceeds the representable range of the required integer type in the context, the compiler will report an error.
let x: Int8 = 128 // Error, 128 out of the range of Int8
let y: UInt8 = 256 // Error, 256 out of the range of UInt8
let z: Int32 = 0x8000_0000 // Error, 0x8000_0000 out of the range of Int32
When using integer literals, suffixes can be added to explicitly specify the type of the literal. The correspondence between suffixes and types is as follows:
| Suffix | Type | Suffix | Type |
|---|---|---|---|
| i8 | Int8 | u8 | UInt8 |
| i16 | Int16 | u16 | UInt16 |
| i32 | Int32 | u32 | UInt32 |
| i64 | Int64 | u64 | UInt64 |
Integer literals with suffixes can be used in the following ways:
var x = 100i8 // x is 100 with type Int8
var y = 0x10u64 // y is 16 with type UInt64
var z = 0o432i32 // z is 282 with type Int32
Character Byte Literals
The Cangjie programming language supports character byte literals to facilitate the representation of UInt8 values using ASCII codes. A character byte literal consists of the character b, a pair of single quotes, and an ASCII character, for example:
var a = b'x' // a is 120 with type UInt8
var b = b'\n' // b is 10 with type UInt8
var c = b'\u{78}' // c is 120 with type UInt8
c = b'\u{90}' - b'\u{66}' + c // c is 162 with type UInt8
b'x' represents a literal value of type UInt8 with a value of 120. Additionally, the escape form b'\u{78}' can be used to represent a literal value of type UInt8 with a hexadecimal value of 0x78 or a decimal value of 120. Note that the \u escape sequence can contain at most two hexadecimal digits, and the value must be less than 256 (in decimal).
Operations Supported by Integer Types
Integer types natively support the following operators: arithmetic operators, bitwise operators, relational operators, increment and decrement operators, and compound assignment operators. The precedence of these operators can be found in the Operators section of the appendix.
Conversions are allowed between integer types, as well as between integer and floating-point types. Integer types can also be converted to character types. For specific syntax and rules on type conversions, refer to Numeric Type Conversions.
Note:
The operations mentioned in this chapter refer to those supported natively, without operator overloading.
Floating-Point Types
Floating-point types include Float16, Float32, and Float64, which are used to represent floating-point numbers (numbers with fractional parts, such as 3.14159, 8.24, and 0.1) with encoding lengths of 16-bit, 32-bit, and 64-bit, respectively. Float16, Float32, and Float64 correspond to the half-precision format (binary16), single-precision format (binary32), and double-precision format (binary64) in IEEE 754.
The precision (number of significant digits) of Float64 is approximately 15 digits, Float32 has a precision of about 6 digits, and Float16 has a precision of about 3 digits. The choice of floating-point type depends on the nature and range of the floating-point numbers to be processed in the code. When multiple floating-point types are suitable, the higher-precision type is preferred because lower-precision types are prone to accumulated calculation errors and have a limited range of precisely representable integers.
Floating-Point Literals
Floating-point literals can be represented in two radix forms: decimal and hexadecimal. In decimal notation, a floating-point literal must contain at least an integer part or a fractional part, and if there is no fractional part, it must include an exponent part (prefixed with e or E, with a base of 10). In hexadecimal notation, a floating-point literal must contain at least an integer part or fractional part (prefixed with 0x or 0X) and must include an exponent part (prefixed with p or P, with a base of 2).
The following examples demonstrate the use of floating-point literals:
let a: Float32 = 3.14 // a is 3.140000 with type Float32
let b: Float32 = 2e3 // b is 2000.000000 with type Float32
let c: Float32 = 2.4e-1 // c is 0.240000 with type Float32
let d: Float64 = .123e2 // d is 12.300000 with type Float64
let e: Float64 = 0x1.1p0 // e is 1.062500 with type Float64
let f: Float64 = 0x1p2 // f is 4.000000 with type Float64
let g: Float64 = 0x.2p4 // g is 2.000000 with type Float64
When using decimal floating-point literals, the type can be explicitly specified by adding a suffix. The correspondence between suffixes and types is as follows:
| Suffix | Type |
|---|---|
| f16 | Float16 |
| f32 | Float32 |
| f64 | Float64 |
Floating-point literals with suffixes can be used as shown below:
let a = 3.14f32 // a is 3.140000 with type Float32
let b = 2e3f32 // b is 2000.000000 with type Float32
let c = 2.4e-1f64 // c is 0.240000 with type Float64
let d = .123e2f64 // d is 12.300000 with type Float64
Supported Operations for Floating-Point Types
Floating-point types natively support the following operators: arithmetic operators, relational operators, and compound assignment operators. Floating-point types do not support increment and decrement operators.
Floating-point types can be converted between each other, as well as between floating-point types and integer types. For specific type conversion syntax and rules, refer to Numeric Type Conversions.
Boolean Type
The Boolean type is denoted by Bool and is used to represent logical true and false values.
Boolean Literals
The Boolean type has only two literals: true and false.
The following example demonstrates the use of Boolean literals:
let a: Bool = true
let b: Bool = false
Supported Operations for Boolean Type
Boolean types support the following operators: logical operators (logical NOT !, logical AND &&, logical OR ||), partial relational operators (== and !=), and partial compound assignment operators (&&= and ||=).
Character Type
The character type is represented using Rune, which can represent all characters in the Unicode character set.
Character Type Literals
Character type literals come in three forms: single characters, escape sequences, and universal characters. A Rune literal starts with the character r, followed by a character enclosed in either single or double quotes.
Single character literals examples:
let a: Rune = r'a'
let b: Rune = r"b"
Escape sequences are character sequences that reinterpret the following character. An escape sequence starts with the escape symbol \, followed by the character to be escaped. Examples:
let slash: Rune = r'\\'
let newLine: Rune = r'\n'
let tab: Rune = r'\t'
Universal characters start with \u, followed by 1 to 8 hexadecimal digits enclosed in curly braces, representing the corresponding Unicode character. Examples:
main() {
let he: Rune = r'\u{4f60}'
let llo: Rune = r'\u{597d}'
print(he)
print(llo)
}
Compiling and executing the above code will output:
你好
Supported Operations for Character Type
Character types support the following operators: relational operators, namely less than (<), greater than (>), less than or equal to (<=), greater than or equal to (>=), equal to (==), and not equal to (!=). These comparisons are based on the Unicode values of the characters.
Rune can be converted to UInt32, and integer types can be converted to Rune. For specific type conversion syntax and rules, please refer to Rune to UInt32 and Integer Type to Rune Conversion.
String Type
The string type is denoted by String and is used to represent textual data, composed of a sequence of Unicode characters.
String Literals
String literals are categorized into three types: single-line string literals, multi-line string literals, and multi-line raw string literals.
Single-line string literals are defined within a pair of single or double quotes. The content within the quotes can consist of any number of arbitrary characters (except for unescaped quotes used to define the string literal and standalone \ characters). Single-line string literals must be written on the same line and cannot span multiple lines. Examples:
let s1: String = ""
let s2 = 'Hello Cangjie Lang'
let s3 = "\"Hello Cangjie Lang\""
let s4 = 'Hello Cangjie Lang\n'
Multi-line string literals must begin and end with three double quotes (""") or three single quotes ('''). The content of the literal starts from the first line after the opening three quotes and continues until the first occurrence of non-escaped three quotes. The content can include any number of arbitrary characters (except for standalone \ characters). Unlike single-line string literals, multi-line string literals can span multiple lines. Examples:
let s1: String = """
"""
let s2 = '''
Hello,
Cangjie Lang'''
Multi-line raw string literals begin with one or more hash symbols (#) followed by a single quote (') or double quote ("), followed by any number of valid characters until the same quote and the same number of hash symbols as the opening are encountered. If the matching quote and the same number of hash symbols are not found before the end of the file, a compilation error occurs. Like multi-line string literals, raw multi-line string literals can span multiple lines. The difference is that escape rules do not apply to raw multi-line string literals; the content remains as-is (escape characters are not interpreted, e.g., \n in s2 below is not a newline character but the string \n composed of \ and n). Examples:
let s1: String = #""#
let s2 = ##'#'\n'## // Output: #'\n
let s3 = ###"
Hello,
Cangjie
Lang"### // Line breaks and indentation in this variable are preserved
For assignment operations of the form left = right, if the left operand is of type Byte (an alias for the built-in type UInt8) and the right operand is a string literal representing an ASCII character, the string will be coerced into the Byte type before assignment. If the left operand is of type Rune and the right operand is a single-character string literal, the string will be coerced into the Rune type before assignment.
main() {
var b: Byte = "0"
print(b)
b = "1"
print(b)
var r: Rune = "0"
print(r)
r = "1"
print(r)
}
Compiling and executing the above code yields the following output:
484901
Interpolated Strings
An interpolated string is a string literal (not applicable to multi-line raw string literals) containing one or more interpolated expressions. By embedding expressions within the string, it effectively avoids the need for string concatenation. Interpolated strings are commonly used in the println function to output non-string variable values, e.g., println("${x}").
Interpolated expressions must be enclosed in curly braces {} and prefixed with $. The {} can contain one or more declarations or expressions.
When an interpolated string is evaluated, each interpolated expression is replaced by the value of the last item within {}, and the entire interpolated string remains a string.
Below is a simple example of interpolated strings:
main() {
let fruit = "apples"
let count = 10
let s = "There are ${count * count} ${fruit}"
println(s)
let r = 2.4
let area = "The area of a circle with radius ${r} is ${let PI = 3.141592; PI * r * r}"
println(area)
}
Compiling and executing the above code yields the following output:
There are 100 apples
The area of a circle with radius 2.400000 is 18.095570
Operations Supported by String Type
The string type supports comparison using relational operators and concatenation using +. The following example demonstrates string equality checks and concatenation:
main() {
let s1 = "abc"
var s2 = "ABC"
let r1 = s1 == s2
println("The result of 'abc' == 'ABC' is: ${r1}")
let r2 = s1 + s2
println("The result of 'abc' + 'ABC' is: ${r2}")
}
Compiling and executing the above code yields the following output:
The result of 'abc' == 'ABC' is: false
The result of 'abc' + 'ABC' is: abcABC
Strings also support other common operations, such as splitting and replacing. Below are some examples:
main() {
var s1 = "abc"
var s2 = "ABCabc"
var s3 = "abcyyabcqqabcbc"
let r1 = s2.contains(s1) // Checks if s2 contains the string s1
println(r1) // true
let r2 = s3.split(s1) // Splits the original string s3 using s1 as the delimiter
println(r2[1]) // yy
s1 = s2
println(s1) // ABCabc
}
Tuple Type
A tuple (Tuple) can combine multiple different types into a new type. The tuple type is denoted as (T1, T2, ..., TN), where T1 to TN can be any type, and different types are connected by commas (,). A tuple must be at least binary. For example, (Int64, Float64) represents a binary tuple type, and (Int64, Float64, String) represents a ternary tuple type.
The length of a tuple is fixed, meaning once an instance of a tuple type is defined, its length cannot be changed.
The tuple type is immutable, meaning once an instance of a tuple type is defined, its content (i.e., individual elements) cannot be updated. However, the entire tuple can be overwritten or replaced, for example:
let tuple1 = (8, false)
var tuple2 = (true, 9, 20)
tuple2 = tuple1 // Error, mismatched types
tuple2[0] = false // Error, 'tuple element' can not be assigned
var tuple3 = (9, true)
tuple3 = tuple1
println(tuple3[0]) // 8
println(tuple3[1]) // false
Tuple Type Literals
The literal of a tuple type is denoted as (e1, e2, ..., eN), where e1 to eN are expressions, and multiple expressions are separated by commas. In the following example, a variable x of type (Int64, Float64) and a variable y of type (Int64, Float64, String) are defined, and tuple type literals are used to assign initial values to them:
let x: (Int64, Float64) = (3, 3.141592)
let y: (Int64, Float64, String) = (3, 3.141592, "PI")
Tuples support accessing elements at specific positions via t[index], where t is a tuple and index is a subscript. The index must be an integer literal starting from 0 and less than the number of tuple elements; otherwise, a compilation error will occur. In the following example, pi[0] and pi[1] are used to access the first and second elements of the binary tuple pi, respectively.
main() {
var pi = (3.14, "PI")
println(pi[0])
println(pi[1])
}
Compiling and executing the above code will output:
3.140000
PI
In assignment expressions, tuples can be used for multiple assignments. Refer to the Assignment Operators section.
Type Parameters of Tuple Types
Explicit type parameter names can be marked for tuple types. In the following example, name and price are type parameter names.
func getFruitPrice(): (name: String, price: Int64) {
return ("banana", 10)
}
main() {
let tmp = getFruitPrice()
var a = tmp[0]
var b = tmp[1]
b++
println("b = ${b}, tmp[1] = ${tmp[1]}")
}
Compiling and executing the above code will output:
b = 11, tmp[1] = 10
For a tuple type, type parameter names must either all be written or all be omitted. Alternating between named and unnamed parameters is not allowed, and the parameter names themselves cannot be used as variables or to access elements in the tuple.
let a: (name: String, Int64) = ("banana", 5) // Error, in a parameter type list, either all parameters must be named, or none of them
let b: (name: String, price: Int64) = ("banana", 5) // OK
b.name // Error, undeclared identifier 'name'
Array Types
Array
The Array type can be used to construct an ordered sequence of data with a single element type.
Cangjie uses Array<T> to represent the Array type, where T denotes the element type of the Array, and T can be any type.
var a: Array<Int64> = [0, 0, 0, 0] // Array whose element type is Int64
var b: Array<String> = ["a1", "a2", "a3"] // Array whose element type is String
Arrays with different element types are considered distinct types, so they cannot be assigned to each other.
Thus, the following example is invalid:
b = a // Type mismatch
An Array can be easily initialized using literals by enclosing a comma-separated list of values in square brackets.
The compiler automatically infers the type of the Array literal based on context.
let a: Array<String> = [] // Created an empty Array whose element type is String
let b = [1, 2, 3, 3, 2, 1] // Created a Array whose element type is Int64, containing elements 1, 2, 3, 3, 2, 1
An Array can also be constructed using a constructor with a specified element type. Here, repeat is a named parameter in the Array constructor.
Note that when initializing an Array with the repeat parameter, the constructor does not copy repeat. If repeat is a reference type, every element in the constructed Array will point to the same reference.
let a = Array<Int64>() // Created an empty Array whose element type is Int64
let c = Array<Int64>(3, repeat: 0) // Created an Array whose element type is Int64, length is 3 and all elements are initialized as 0
let d = Array<Int64>(3, {i => i + 1}) // Created an Array whose element type is Int64, length is 3 and all elements are initialized by the initialization function
In the example let d = Array<Int64>(3, {i => i + 1}), a lambda expression is used as the initialization function to initialize each element in the Array, i.e., {i => i + 1}.
Accessing Array Members
To access all elements of an Array, you can use a for-in loop to iterate through them.
Arrays are ordered by insertion, so the traversal order is always consistent.
main() {
let arr = [0, 1, 2]
for (i in arr) {
println("The element is ${i}")
}
}
Compiling and executing the above code will output:
The element is 0
The element is 1
The element is 2
To determine the number of elements in an Array, use the size property.
main() {
let arr = [0, 1, 2]
if (arr.size == 0) {
println("This is an empty array")
} else {
println("The size of array is ${arr.size}")
}
}
Compiling and executing the above code will output:
The size of array is 3
To access a single element at a specific position, use subscript syntax (the subscript must be of type Int64). The first element of a non-empty Array always starts at position 0. You can access any element from 0 up to the last position (Array.size - 1). Negative indices or indices greater than or equal to size are invalid. If the compiler detects an invalid index, it will report an error at compile time; otherwise, it will throw an exception at runtime.
main() {
let arr = [0, 1, 2]
let a = arr[0] // a == 0
let b = arr[1] // b == 1
let c = arr[-1] // array size is '3', but access index is '-1', which would overflow
}
To retrieve a segment of an Array, you can pass a Range type value to the subscript, which will return a sub-Array corresponding to the specified range.
let arr1 = [0, 1, 2, 3, 4, 5, 6]
let arr2 = arr1[0..5] // arr2 contains the elements 0, 1, 2, 3, 4
When a Range literal is used in subscript syntax, the start or end can be omitted.
If start is omitted, the Range starts from 0; if end is omitted, the Range extends to the last element.
let arr1 = [0, 1, 2, 3, 4, 5, 6]
let arr2 = arr1[..3] // arr2 contains elements 0, 1, 2
let arr3 = arr1[2..] // arr3 contains elements 2, 3, 4, 5, 6
Modifying Arrays
Arrays are fixed-length Collection types, so they do not provide member functions for adding or removing elements.
However, Arrays allow modification of their elements using subscript syntax.
main() {
let arr = [0, 1, 2, 3, 4, 5]
arr[0] = 3
println("The first element is ${arr[0]}")
}
Compiling and executing the above code will output:
The first element is 3
Although Arrays are struct types, they internally hold references to elements. Thus, when used as expressions, they do not create copies. All references to the same Array instance share the same element data.
Therefore, modifications to an Array’s elements affect all references to that instance.
let arr1 = [0, 1, 2]
let arr2 = arr1
arr2[0] = 3
// arr1 contains elements 3, 1, 2
// arr2 contains elements 3, 1, 2
VArray
In addition to the reference-type Array, Cangjie introduces a value-type array VArray<T, $N>, where T is the element type and $N is a fixed syntax. The $ followed by an Int64 literal denotes the length of the value-type array. Note that VArray<T, $N> cannot omit <T, $N>, and when using type aliases, the VArray keyword and its generic parameters cannot be split.
Compared to frequently using reference-type Arrays, value-type VArrays reduce heap memory allocation and garbage collection pressure. However, due to the overhead of copying during value-type passing and assignment, it is not recommended to use large VArrays in performance-sensitive scenarios. For characteristics of value types and reference types, refer to Value Types and Reference Type Variables.
type varr1 = VArray<Int64, $3> // OK
type varr2 = VArray // Error
Note:
Due to runtime backend limitations, the element type
TofVArray<T, $N>or its members cannot contain reference types, enum types, lambda expressions (exceptCFunc), or uninstantiated generic types.
A VArray can be initialized using an array literal, where the left-hand side a must specify the instantiated type of VArray:
var a: VArray<Int64, $3> = [1, 2, 3]
It also has two constructors:
// VArray<T, $N>(initElement: (Int64) -> T)
let b = VArray<Int64, $5>({ i => i }) // [0, 1, 2, 3, 4]
// VArray<T, $N>(repeat!: T)
let c = VArray<Int64, $5>(repeat: 0) // [0, 0, 0, 0, 0]
Additionally, VArray<T, $N> provides two member methods:
-
The
[]operator method for subscript access and modification:var a: VArray<Int64, $3> = [1, 2, 3] let i = a[1] // i is 2 a[2] = 4 // a is [1, 2, 4]The subscript must be of type
Int64. -
The
sizemember to get the length of theVArray:var a: VArray<Int64, $3> = [1, 2, 3] let s = a.size // s is 3The
sizeproperty is of typeInt64.
Furthermore, VArray supports interoperability between Cangjie and C. For details, refer to Arrays.
Range Type
The range type is used to represent sequences with a fixed step size. It is a generic type denoted as Range<T>. When T is instantiated with different types, the type must support relational operators and be capable of addition with values of type Int64, resulting in different range types. For example, the most commonly used Range<Int64> represents integer ranges.
Each instance of a range type contains three values: start, end, and step. Here, start and end represent the initial and terminal values of the sequence, respectively, while step denotes the difference between consecutive elements (i.e., the step size). The types of start and end are the same (i.e., the type with which T is instantiated), whereas step is of type Int64 and cannot be equal to 0.
The following example demonstrates how to instantiate range types (for details on range type definitions and their properties, refer to the Cangjie Programming Language Library API):
// Range<T>(start: T, end: T, step: Int64, hasStart: Bool, hasEnd: Bool, isClosed: Bool)
let r1 = Range<Int64>(0, 10, 1, true, true, true) // r1 contains 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
let r2 = Range<Int64>(0, 10, 1, true, true, false) // r2 contains 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
let r3 = Range<Int64>(10, 0, -2, true, true, false) // r3 contains 10, 8, 6, 4, 2
Range Type Literals
Range literals come in two forms: “left-closed right-open” and “left-closed right-closed” ranges.
- The “left-closed right-open” range is formatted as
start..end : step, representing a range that starts atstart, increments bystep, and ends beforeend(excludingend). - The “left-closed right-closed” range is formatted as
start..=end : step, representing a range that starts atstart, increments bystep, and includesend(includingend).
The following example defines several range-type variables:
let n = 10
let r1 = 0..10 : 1 // r1 contains 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
let r2 = 0..=n : 1 // r2 contains 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
let r3 = n..0 : -2 // r3 contains 10, 8, 6, 4, 2
let r4 = 10..=0 : -2 // r4 contains 10, 8, 6, 4, 2, 0
In range literals, step can be omitted, in which case it defaults to 1. However, the value of step cannot be 0. Additionally, a range may be empty (i.e., a sequence containing no elements), as shown below:
let r5 = 0..10 // the step of r5 is 1, and r5 contains 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
let r6 = 0..10 : 0 // Error, step cannot be 0
let r7 = 10..0 : 1 // r7 to r10 are empty ranges
let r8 = 0..10 : -1
let r9 = 10..=0 : 1
let r10 = 0..=10 : -1
Note:
- For the expression
start..end : step, whenstep > 0andstart >= end, or whenstep < 0andstart <= end,start..end : stepis an empty range.- For the expression
start..=end : step, whenstep > 0andstart > end, or whenstep < 0andstart < end,start..=end : stepis an empty range.
The Unit Type
For expressions that only concern side effects without caring about values, their type is Unit. For example, the print function, assignment expressions, compound assignment expressions, increment and decrement expressions, and loop expressions all have the Unit type.
The Unit type has only one value, which is also its literal: (). Apart from assignment, equality checks, and inequality checks, the Unit type does not support other operations.
The Nothing Type
Nothing is a special type that contains no values, and the Nothing type is a subtype of all types (including the Unit type).
The break, continue, return, and throw expressions are of type Nothing. When program execution reaches these expressions, the code following them will not be executed. return can only be used within a function body, while break and continue can only be used within loop bodies. Refer to the following example:
while (true) {
func f() {
break // Error, break must be used directly inside a loop
}
let g = { =>
continue // Error, continue must be used directly inside a loop
}
}
Since function parameters and their default values do not belong to the function body, the return expression in the following example lacks an enclosing function body—it neither belongs to the outer function f (because the inner function definition g has already started) nor is it within the inner function g’s body (for related content on this use case, refer to Nested Functions):
func f() {
func g(x!: Int64 = return) { // Error, return must be used inside a function body
0
}
1
}
Note:
Currently, the compiler does not allow explicit use of the
Nothingtype in places where types are required.
Defining Functions
In Cangjie, the keyword func is used to denote the start of a function definition. Following func are the function name, parameter list, optional return type, and function body in sequence. The function name can be any valid identifier. The parameter list is enclosed in parentheses (with multiple parameters separated by commas), a colon separates the parameter list from the return type (if present), and the function body is enclosed in curly braces.
Example of a function definition:
func add(a: Int64, b: Int64): Int64 {
return a + b
}
The above example defines a function named add with a parameter list consisting of two Int64 parameters a and b, a return type of Int64, and a function body that returns the sum of a and b.
The following sections provide further details on the parameter list, return type, and function body in function definitions.
Parameter List
A function can have zero or more parameters, all defined in the parameter list. Parameters in the parameter list can be categorized into two types based on whether parameter names are required during function calls: non-named parameters and named parameters.
Non-named parameters are defined as p: T, where p is the parameter name and T is the type of parameter p, connected by a colon. For example, the parameters a and b in the add function above are non-named parameters.
Named parameters are defined as p!: T, differing from non-named parameters by the addition of ! after the parameter name p. The non-named parameters in the add function can be modified to named parameters as shown below:
func add(a!: Int64, b!: Int64): Int64 {
return a + b
}
Named parameters can also have default values, specified via p!: T = e, where the default value of parameter p is set to the value of expression e. For example, the default values of the two parameters in the add function can be set to 1:
func add(a!: Int64 = 1, b!: Int64 = 1): Int64 {
return a + b
}
Note:
Only named parameters can have default values; non-named parameters cannot.
A parameter list can include both non-named and named parameters. However, non-named parameters must be defined before named parameters, meaning no non-named parameters can appear after named parameters. For example, the following parameter list definition for the add function is invalid:
func add(a!: Int64, b: Int64): Int64 { // Error, named parameter 'a' must be defined after non-named parameter 'b'
return a + b
}
The primary difference between non-named and named parameters lies in their behavior during function calls. For details, refer to the Calling Functions section below.
Function parameters are immutable variables and cannot be assigned values within the function definition.
func add(a: Int64, b: Int64): Int64 {
a = a + b // Error
return a
}
The scope of function parameters extends from their definition to the end of the function body:
func add(a: Int64, b: Int64): Int64 {
var a_ = a // OK
var b = b // Error, redefinition of declaration 'b'
return a
}
Return Type
The return type of a function is the type of the value obtained when the function is called. In function definitions, the return type is optional: it can be explicitly defined (placed between the parameter list and the function body) or omitted, leaving it to the compiler to infer.
When the return type is explicitly defined, the type of the function body (see the Function Body section below for how the function body type is determined) and the types of all return e expressions in the function body must be subtypes of the return type. For example, in the add function above, the return type is explicitly defined as Int64. If the function body is modified to return (a, b), a type mismatch error will occur:
// Error, the type of the expression after return does not match the return type of the function
func add(a: Int64, b: Int64): Int64 {
return (a, b)
}
If the return type is not explicitly defined in the function definition, the compiler infers it based on the type of the function body and all return expressions in the function body. For example, in the following add function, the return type is omitted, but the compiler can infer it as Int64 from return a + b:
func add(a: Int64, b: Int64) {
return a + b
}
Note:
The return type of a function cannot always be inferred. If type inference fails, the compiler will report an error.
When the return type is specified as
Unit, the compiler automatically insertsreturn ()at all possible return points in the function body, ensuring the return type is alwaysUnit.
Function Body
The function body defines the operations executed when the function is called. It typically consists of a series of variable definitions and expressions and can also include nested function definitions. For example, the function body of the add function below first defines a variable r of type Int64 (initialized to 0), assigns the value of a + b to r, and finally returns r:
func add(a: Int64, b: Int64) {
var r = 0
r = a + b
return r
}
A return expression can be used anywhere in the function body to terminate the function’s execution and return a value. There are two forms of return expressions: return and return expr (where expr is an expression).
For return expr, the type of expr must match the function’s return type. For example, the following example will error because the type of 100 (Int64) does not match the return type of function foo (String):
// Error, cannot convert an integer literal to type 'Struct-String'
func foo(): String {
return 100
}
For return, it is equivalent to return (), so the function’s return type must be Unit.
func add(a: Int64, b: Int64) {
var r = 0
r = a + b
return r
}
func foo(): Unit {
add(1, 2)
return
}
Note:
The
returnexpression as a whole has the typeNothing, regardless of the expression that follows it.
Variables defined within the function body are a type of local variable (e.g., variable r in the example above). Their scope extends from their definition to the end of the function body.
For a local variable, it is allowed to define a variable with the same name in an outer scope. Within the local variable’s scope, the local variable will “shadow” the variable of the same name in the outer scope. For example:
let r = 0
func add(a: Int64, b: Int64) {
var r = 0
r = a + b
return r
}
In the example above, a global variable r of type Int64 is defined before the add function, and a local variable r with the same name is defined within the function body. Within the function body, all references to r (e.g., r = a + b) will refer to the local variable r, meaning the local variable r “shadows” the global variable r within the function body.
As mentioned in the Return Type section, the function body also has a type. The type of the function body is the type of the last “item” in the function body: if the last item is an expression, the function body’s type is the type of that expression; if the last item is a variable definition, function declaration, or the function body is empty, the function body’s type is Unit. For example:
func add(a: Int64, b: Int64): Int64 {
a + b
}
In the example above, since the last “item” in the function body is an expression of type Int64 (i.e., a + b), the function body’s type is also Int64, matching the function’s return type. Similarly, in the following example, the last item in the function body is a call to the print function, so the function body’s type is Unit, which also matches the function’s return type:
func foo(): Unit {
let s = "Hello"
print(s)
}
Function Invocation
A function is invoked in the form f(arg1, arg2, ..., argn), where f is the name of the function to be called, and arg1 through argn are the n arguments (called actual parameters) passed during invocation. Each actual parameter’s type must be a subtype of the corresponding parameter’s type. The number of actual parameters can range from zero to multiple. When there are zero parameters, the invocation takes the form f().
Depending on whether the parameters in the function definition are positional or named, the way arguments are passed during invocation differs:
- For positional parameters, the corresponding argument is an expression.
- For named parameters, the argument must be passed in the form
p: e, wherepis the name of the named parameter andeis the expression (i.e., the value passed to parameterp).
Example of positional parameter invocation:
func add(a: Int64, b: Int64) {
return a + b
}
main() {
let x = 1
let y = 2
let r = add(x, y)
println("The sum of x and y is ${r}")
}
Execution result:
The sum of x and y is 3
Example of named parameter invocation:
func add(a: Int64, b!: Int64) {
return a + b
}
main() {
let x = 1
let y = 2
let r = add(x, b: y)
println("The sum of x and y is ${r}")
}
Execution result:
The sum of x and y is 3
For functions with multiple named parameters, the order of passing arguments during invocation can differ from the parameter order in the definition. For example, in the following case, parameter b can appear before a when invoking the add function:
func add(a!: Int64, b!: Int64) {
return a + b
}
main() {
let x = 1
let y = 2
let r = add(b: y, a: x)
println("The sum of x and y is ${r}")
}
Execution result:
The sum of x and y is 3
For named parameters with default values, if no argument is passed during invocation, the parameter will use its default value. For example, in the following case, when invoking the add function, no argument is passed for parameter b, so its value defaults to 2 as defined:
func add(a: Int64, b!: Int64 = 2) {
return a + b
}
main() {
let x = 1
let r = add(x)
println("The sum of x and y is ${r}")
}
Execution result:
The sum of x and y is 3
For named parameters with default values, new arguments can also be passed during invocation. In this case, the parameter’s value will be the new argument’s value, overriding the default. For example, in the following case, when invoking the add function, a new argument value 20 is passed for parameter b, so its value becomes 20:
func add(a: Int64, b!: Int64 = 2) {
return a + b
}
main() {
let x = 1
let r = add(x, b: 20)
println("The sum of x and y is ${r}")
}
Execution result:
The sum of x and y is 21
Function Types
In the Cangjie programming language, functions are first-class citizens, meaning they can be passed as arguments to other functions, returned as values from functions, or assigned to variables. Therefore, functions themselves have types, referred to as function types.
A function type consists of the function’s parameter types and return type, connected by ->. Parameter types are enclosed in parentheses (), which can contain zero or more parameters. If there are multiple parameters, their types are separated by commas (,).
For example:
func hello(): Unit {
println("Hello!")
}
The above example defines a function named hello with the type () -> Unit, indicating that this function takes no parameters and returns Unit.
Here are some additional examples:
-
Example: A function named
displaywith type(Int64) -> Unit, indicating it takes one parameter of typeInt64and returnsUnit.func display(a: Int64): Unit { println(a) } -
Example: A function named
addwith type(Int64, Int64) -> Int64, indicating it takes two parameters of typeInt64and returnsInt64.func add(a: Int64, b: Int64): Int64 { a + b } -
Example: A function named
returnTuplewith type(Int64, Int64) -> (Int64, Int64), taking twoInt64parameters and returning a tuple type(Int64, Int64).func returnTuple(a: Int64, b: Int64): (Int64, Int64) { (a, b) }
Type Parameters in Function Types
Function types can have explicit type parameter names. In the following example, name and price are type parameter names.
func showFruitPrice(name: String, price: Int64) {
println("fruit: ${name} price: ${price} yuan")
}
main() {
let fruitPriceHandler: (name: String, price: Int64) -> Unit
fruitPriceHandler = showFruitPrice
fruitPriceHandler("banana", 10)
}
Note that for a function type, you must either consistently include type parameter names or omit them entirely; mixing them is not allowed.
let handler: (name: String, Int64) -> Int64 // Error
Function Types as Parameter Types
Example: A function named printAdd with type ((Int64, Int64) -> Int64, Int64, Int64) -> Unit, indicating it takes three parameters: a function type (Int64, Int64) -> Int64 and two Int64 values, returning Unit.
func printAdd(add: (Int64, Int64) -> Int64, a: Int64, b: Int64): Unit {
println(add(a, b))
}
Function Types as Return Types
Function types can also serve as the return type of another function.
In the following example, the function returnAdd has type () -> (Int64, Int64) -> Int64, meaning it takes no parameters and returns a function of type (Int64, Int64) -> Int64. Note that -> is right-associative.
func add(a: Int64, b: Int64): Int64 {
a + b
}
func returnAdd(): (Int64, Int64) -> Int64 {
add
}
main() {
var a = returnAdd()
println(a(1,2))
}
Function Types as Variable Types
Function names themselves are expressions, and their types correspond to their function types.
func add(p1: Int64, p2: Int64): Int64 {
p1 + p2
}
let f: (Int64, Int64) -> Int64 = add
In the above example, the function add has type (Int64, Int64) -> Int64. The variable f is declared with the same type and initialized with add.
If a function is overloaded in the current scope (see Function Overloading), using the function name directly as an expression may cause ambiguity. In such cases, the compiler will report an error:
func add(i: Int64, j: Int64) {
i + j
}
func add(i: Float64, j: Float64) {
i + j
}
main() {
var f = add // Error, ambiguous function 'add'
var plus: (Int64, Int64) -> Int64 = add // OK
}
Nested Functions
Functions defined at the top level of a source file are called global functions. Functions defined within the body of another function are called nested functions.
Scope of Usage:
- The scope of a nested function is limited to its enclosing outer function. Nested functions can access variables and parameters of the outer function, but the outer function cannot directly access the internal variables of nested functions.
- Nested functions can be called by the outer function or returned by the outer function.
Lifecycle:
- The lifecycle of a nested function is closely tied to its outer function. Each time the outer function is called, the nested function is created; when the outer function completes execution, the nested function is typically destroyed unless it is externally referenced through return values or closures.
Usage Rules and Considerations:
- Use nested functions only within their corresponding outer functions.
- Avoid excessive nesting. This can complicate code structure, making it difficult to understand and maintain. Therefore, avoid excessive nesting that leads to code confusion.
- Be mindful of closure usage. If a nested function is returned and used as a closure, note that the closure may capture variables from the outer function, causing these variables to remain occupied even after the outer function completes, thereby affecting memory management.
Example: The function foo defines a nested function nestAdd inside it. The nested function nestAdd can be called within foo, or it can be returned as a value to be called outside foo:
func foo() {
func nestAdd(a: Int64, b: Int64) {
a + b + 3
}
println(nestAdd(1, 2)) // 6
return nestAdd
}
main() {
let f = foo()
let x = f(1, 2)
println("result: ${x}")
}
The program will output:
6
result: 6
Lambda Expressions
Definition of Lambda Expressions
A lambda expression is an anonymous function (i.e., a function without a name) designed to quickly define concise function logic in programs without explicitly declaring a function name. This concept originates from lambda calculus in mathematics and has been introduced into various programming languages (such as C++, Python, C#, etc.) to simplify code and enhance flexibility. The Cangjie programming language also incorporates lambda expressions, and their usage will be detailed in this section.
The syntax of a lambda expression is as follows: { p1: T1, ..., pn: Tn => expressions | declarations }.
Here, the part before => is the parameter list, where multiple parameters are separated by ,, and each parameter name and type are separated by :. The part before => can also be empty (parameterless). The part after => is the lambda expression body, which consists of a sequence of expressions or declarations. The scope of lambda expression parameter names is the same as that of functions, limited to the lambda expression body, and their scope level is equivalent to variables defined within the lambda expression body.
let f1 = { a: Int64, b: Int64 => a + b }
var display = { => // Parameterless lambda expression.
println("Hello")
println("World")
}
Lambda expressions, whether they have parameters or not, cannot omit =>, unless they are used as trailing lambdas. For example:
var display = { => println("Hello") }
func f2(lam: () -> Unit) {}
let f2Res = f2 { println("World") } // OK to omit the =>
Type annotations for parameters in lambda expressions can be omitted. In the following cases, if parameter types are omitted, the compiler will attempt type inference. If the compiler cannot infer the type, a compilation error will occur:
- When a lambda expression is assigned to a variable, its parameter types are inferred from the variable type;
- When a lambda expression is used as an argument in a function call, its parameter types are inferred from the function’s parameter types.
// The parameter types are inferred from the type of the variable sum1
var sum1: (Int64, Int64) -> Int64 = { a, b => a + b }
var sum2: (Int64, Int64) -> Int64 = { a: Int64, b => a + b }
func f(a1: (Int64) -> Int64): Int64 {
a1(1)
}
main(): Int64 {
// The parameter type of lambda is inferred from the type of function f
f({ a2 => a2 + 10 })
}
Lambda expressions do not support explicit return type declarations. Their return type is always inferred from the context, and an error is reported if inference fails.
-
If the context explicitly specifies the return type of the lambda expression, its return type is the context-specified type.
-
When a lambda expression is assigned to a variable, its return type is inferred from the variable type:
let f: () -> Unit = { => println(10) } -
When a lambda expression is used as an argument, its return type is inferred from the parameter type of the function call:
func f(a1: (Int64) -> Int64): Int64 { a1(1) } main(): Int64 { f({ a2: Int64 => a2 + 10 }) } -
When a lambda expression is used as a return value, its return type is inferred from the return type of the enclosing function:
func f(): (Int64) -> Int64 { { a: Int64 => a } }
-
-
If the context does not explicitly specify the type, similar to deriving the return type of a function, the compiler will infer the return type of the lambda expression based on the types of all
return xxxexpressions in the lambda body and the type of the lambda body itself.-
The content to the right of
=>follows the same rules as a regular function body, with the return type beingInt64:let sum1 = { a: Int64, b: Int64 => a + b } -
If the right side of
=>is empty, the return type isUnit:let f = { => }
-
Lambda Expression Invocation
Lambda expressions support immediate invocation. For example:
let r1 = { a: Int64, b: Int64 => a + b }(1, 2) // r1 = 3
let r2 = { => 123 }() // r2 = 123
Lambda expressions can also be assigned to a variable and invoked using the variable name. For example:
func f() {
var g = { x: Int64 => println("x = ${x}") }
g(2)
}
Closures
A function or lambda that captures variables from its static scope is referred to as a closure. The combination of the function/lambda and the captured variables forms a closure, enabling it to operate correctly even when outside the scope where it was defined.
Variable capture occurs in a function or lambda definition when accessing the following types of variables:
-
Accessing a local variable defined outside the current function in the default parameter values;
-
Accessing a local variable defined outside the current function or lambda within the function or lambda;
-
Accessing instance member variables or
thisin a non-member function or lambda defined within aclass/struct.
The following scenarios do not constitute variable capture:
-
Accessing local variables defined within the current function or lambda;
-
Accessing formal parameters of the current function or lambda;
-
Accessing global variables and static member variables;
-
Accessing instance member variables within instance member functions or properties. Since instance member functions or properties receive
thisas a parameter, all instance member variables are accessed viathiswithin them.
Variable capture occurs at the time of closure definition, thus adhering to the following rules:
-
The captured variable must be visible at the time of closure definition, otherwise a compilation error occurs;
-
The captured variable must be fully initialized at the time of closure definition, otherwise a compilation error occurs.
Example 1: The closure add captures the locally declared variable num with let. Later, it is returned via the function and invoked outside the scope where num was defined, yet it can still access num normally.
func returnAddNum(): (Int64) -> Int64 {
let num: Int64 = 10
func add(a: Int64) {
return a + num
}
add
}
main() {
let f = returnAddNum()
println(f(10))
}
The program outputs:
20
Example 2: Captured variables must be visible at the time of closure definition.
func f() {
let x = 99
func f1() {
println(x)
}
let f2 = { =>
println(y) // Error, cannot capture 'y' which is not defined yet
}
let y = 88
f1() // Print 99
f2()
}
Example 3: Captured variables must be initialized before the closure definition.
func f() {
let x: Int64
func f1() {
println(x) // Error, x is not initialized yet
}
x = 99
f1()
}
Example 4: If the captured variable is a reference type, the value of its mutable instance member variables can be modified.
class C {
public var num: Int64 = 0
}
func returnIncrementer(): () -> Unit {
let c: C = C()
func incrementer() {
c.num++
}
incrementer
}
main() {
let f = returnIncrementer()
f() // c.num increases by 1
}
Example 5: To prevent closures capturing variables declared with var from escaping, such closures can only be invoked and cannot be used as first-class citizens. This includes prohibiting assignment to variables, use as arguments or return values, and direct use of the closure name as an expression. A closure is considered escaping if it can be invoked outside the function after the function has completed execution.
func f() {
var x = 1
let y = 2
func g() {
println(x) // OK, captured a mutable variable.
}
let b = g // Error, g cannot be assigned to a variable
g // Error, g cannot be used as an expression
g() // OK, g can be invoked
g // Error, g cannot be used as a return value
}
Example 6: Note that capture is transitive. If a function f invokes a function g that captures a var variable, and the var variable captured by g is not defined within f, then f also captures the var variable. In this case, f cannot be used as a first-class citizen either.
Example 6.1: g captures the var-declared variable x, f invokes g, and the x captured by g is not defined within f. Thus, f also cannot be used as a first-class citizen:
func h(){
var x = 1
func g() { x } // captured a mutable variable
func f() {
g() // invoked g
}
return f // Error
}
Example 6.2: g captures the var-declared variable x, and f invokes g. However, the x captured by g is defined within f, and f does not capture any other var-declared variables. Therefore, f can still be used as a first-class citizen:
func h(){
func f() {
var x = 1
func g() { x } // captured a mutable variable
g()
}
return f // OK
}
Example 7: Accessing static member variables and global variables does not constitute variable capture. Therefore, functions or lambdas accessing var-declared global variables or static member variables can still be used as first-class citizens.
class C {
static public var a: Int32 = 0
static public func foo() {
a++ // OK
return a
}
}
var globalV1 = 0
func countGlobalV1() {
globalV1++
C.a = 99
let g = C.foo // OK
}
func g(){
let f = countGlobalV1 // OK
f()
}
Function Call Syntactic Sugar
Trailing Lambda
Trailing lambda syntax can make function calls appear as if they were built-in language constructs, enhancing the language’s extensibility.
When the last parameter of a function is of function type, and the corresponding argument in the function call is a lambda, the trailing lambda syntax can be used by placing the lambda outside the parentheses at the end of the function call.
For example, the following defines a myIf function where the first parameter is of type Bool and the second is a function type. When the first parameter is true, it returns the value from calling the second parameter; otherwise, it returns 0. The myIf function can be called either as a regular function or using trailing lambda syntax.
func myIf(a: Bool, fn: () -> Int64) {
if(a) {
fn()
} else {
0
}
}
func test() {
myIf(true, { => 100 }) // General function call
myIf(true) { // Trailing closure call
100
}
}
When a function call has exactly one lambda argument, the () can be omitted, leaving only the lambda.
Example:
func f(fn: (Int64) -> Int64) { fn(1) }
func test() {
f { i => i * i }
}
Flow Expressions
Flow operators include two types: the infix operator |> (called pipeline) representing data flow and the infix operator ~> (called composition) representing function composition.
Pipeline Expressions
When a series of transformations need to be applied to input data, pipeline expressions can simplify the description. The syntax for a pipeline expression is: e1 |> e2, which is equivalent to the following syntactic sugar: let v = e1; e2(v).
Here, e2 is an expression of function type, and the type of e1 must be a subtype of e2’s parameter type.
Example:
func inc(x: Array<Int64>): Array<Int64> { // Increasing the value of each element in the array by '1'
let s = x.size
var i = 0
for (e in x where i < s) {
x[i] = e + 1
i++
}
x
}
func sum(y: Array<Int64>): Int64 { // Get the sum of elements in the array
var s = 0
for (j in y) {
s += j
}
s
}
let arr: Array<Int64> = [1, 3, 5]
let res = arr |> inc |> sum // res = 12
Composition Expressions
composition expressions represent the composition of two single-parameter functions. The syntax for a composition expression is f ~> g, which is equivalent to { x => g(f(x)) }.
Here, both f and g must be expressions of function type with exactly one parameter.
For f and g to compose, the return type of f(x) must be a subtype of the parameter type of g(...).
Example 1:
func f(x: Int64): Float64 {
Float64(x)
}
func g(x: Float64): Float64 {
x
}
var fg = f ~> g // The same as { x: Int64 => g(f(x)) }
Example 2:
func f(x: Int64): Float64 {
Float64(x)
}
let lambdaComp = {x: Int64 => x} ~> f // The same as { x: Int64 => f({x: Int64 => x}(x)) }
Example 3:
func h1<T>(x: T): T { x }
func h2<T>(x: T): T { x }
var hh = h1<Int64> ~> h2<Int64> // The same as { x: Int64 => h2<Int64>(h1<Int64>(x)) }
Note:
In the expression
f ~> g,fis evaluated first, followed byg, and then the function composition is performed.
Additionally, flow operators cannot be directly used with functions that have non-default named parameters because such functions require named arguments to be explicitly provided. For example:
func f(a!: Int64): Unit {}
var a = 1 |> f // Error
If needed, developers can pass named arguments to the f function via a lambda expression:
func f(a!: Int64): Unit {}
var x = 1 |> { x: Int64 => f(a: x) } // OK
For the same reason, when f has default parameter values, using it directly with flow operators is also incorrect:
func f(a!: Int64 = 2): Unit {}
var a = 1 |> f // Error
However, when all named parameters have default values, the function can be called without providing named arguments, requiring only non-named parameters. Such functions can be used with flow operators:
func f(a: Int64, b!: Int64 = 2): Unit {}
var a = 1 |> f // OK
Of course, if you want to pass other arguments to parameter b when calling f, you still need to use a lambda expression:
func f(a: Int64, b!: Int64 = 2): Unit {}
var a = 1 |> {x: Int64 => f(x, b: 3)} // OK
Variadic Parameters
Variadic parameters are a special function call syntactic sugar. When the last non-named parameter of a function is of type Array, the corresponding argument position can directly accept a sequence of parameters instead of an Array literal (the number of parameters can be zero or more). Example:
func sum(arr: Array<Int64>) {
var total = 0
for (x in arr) {
total += x
}
return total
}
main() {
println(sum())
println(sum(1, 2, 3))
}
Program output:
0
6
Note that only the last non-named parameter can be a variadic parameter. Named parameters cannot use this syntactic sugar.
func length(arr!: Array<Int64>) {
return arr.size
}
main() {
println(length()) // Error, expected 1 argument, found 0
println(length(1, 2, 3)) // Error, expected 1 argument, found 3
}
Variadic parameters can appear in global functions, static member functions, instance member functions, local functions, constructors, function variables, lambdas, function call operator overloads, and index operator overloads. They are not supported in other operator overloads, composition, or pipeline calls. Example:
class Counter {
var total = 0
init(data: Array<Int64>) { total = data.size }
operator func ()(data: Array<Int64>) { total += data.size }
}
main() {
let counter = Counter(1, 2)
println(counter.total)
counter(3, 4, 5)
println(counter.total)
}
Program output:
2
5
Function overload resolution always prioritizes functions that can match without using variadic parameters. Only when no functions match will the compiler attempt to resolve using variadic parameters. Example:
func f<T>(x: T) where T <: ToString {
println("item: ${x}")
}
func f(arr: Array<Int64>) {
println("array: ${arr}")
}
main() {
f()
f(1)
f(1, 2)
}
Program output:
array: []
item: 1
array: [1, 2]
When the compiler cannot resolve the ambiguity, it will report an error:
func f(arr: Array<Int64>) { arr.size }
func f(first: Int64, arr: Array<Int64>) { first + arr.size }
main() {
println(f(1, 2, 3)) // Error
}
Function Overloading
Definition of Function Overloading
In the Cangjie programming language, when multiple function definitions share the same name within a scope, this phenomenon is called function overloading.
-
Two functions constitute overloading if they share the same name but have different parameters (either differing in parameter count or having the same count but different parameter types). Example:
// Scenario 1 func f(a: Int64): Unit {} func f(a: Float64): Unit {} func f(a: Int64, b: Float64): Unit {} -
For two generic functions with the same name (see the Generic Functions chapter), if after renaming the generic type parameters of one function (to make the generic parameter order identical), their non-generic parts have different function parameters, they constitute overloading. Otherwise, these two generic functions result in a duplicate definition error (type argument constraints are not considered in this judgment). Example:
// Scenario 2 interface I1{} interface I2{} func f1<X, Y>(a: X, b: Y) {} func f1<Y, X>(a: X, b: Y) {} // OK: after rename generic type parameter, it will be 'func f1<X, Y>(a: Y, b: X)' func f2<T>(a: T) where T <: I1 {} // Error, not overloading func f2<T>(a: T) where T <: I2 {} // Error, not overloading -
Two constructors within the same class with different parameters constitute overloading. Example:
// Scenario 3 class C { var a: Int64 var b: Float64 public init(a: Int64, b: Float64) { this.a = a this.b = b } public init(a: Int64) { b = 0.0 this.a = a } } -
The primary constructor and
initconstructor within the same class with different parameters constitute overloading (the primary constructor andinitfunction are considered to share the same name). Example:// Scenario 4 class C { C(var a!: Int64, var b!: Float64) { this.a = a this.b = b } public init(a: Int64) { b = 0.0 this.a = a } } -
Two functions with the same name but different parameters defined in different scopes constitute overloading in a scope where both functions are visible. Example:
// Scenario 5 func f(a: Int64): Unit {} func g() { func f(a: Float64): Unit {} } -
If a subclass contains a function with the same name as its parent class but with different parameter types, this constitutes function overloading. Example:
// Scenario 6 open class Base { public func f(a: Int64): Unit {} } class Sub <: Base { public func f(a: Float64): Unit {} }
Only function declarations can introduce function overloading. The following scenarios do not constitute overloading, and two names that do not constitute overloading cannot be defined or declared in the same scope:
- Static member functions and instance member functions of class, interface, or struct types cannot overload each other.
- Constructors, static member functions, and instance member functions of enum types cannot overload each other.
In the following example, both variables are of function type with different parameter types, but since they are not function declarations, they cannot overload each other. The example will result in a compilation error (redefinition error):
main() {
var f: (Int64) -> Unit
var f: (Float64) -> Unit
}
In the following example, although variable f is of function type, variables and functions cannot share the same name. The example will result in a compilation error (redefinition error):
main() {
var f: (Int64) -> Unit
func f(a: Float64): Unit {} // Error, functions and variables cannot have the same name
}
In the following example, the static member function f and instance member function f have different parameter types, but since static and instance member functions within a class cannot overload each other, the example will result in a compilation error:
class C {
public static func f(a: Int64): Unit {}
public func f(a: Float64): Unit {}
}
Function Overload Resolution
When a function is called, all callable functions (those visible in the current scope and passing type checks) form a candidate set. To determine which function in the candidate set to select, function overload resolution follows these rules:
-
Prefer functions in higher-scoped contexts. In nested expressions or functions, inner scopes have higher precedence.
In the following example, when calling
g(Sub())within theinnerfunction body, the candidate set includes both thegfunction defined insideinnerand thegfunction defined outsideinner. The resolution selects the higher-scopedgfunction insideinner.open class Base {} class Sub <: Base {} func outer() { func g(a: Sub) { print("1") } func inner() { func g(a: Base) { print("2") } g(Sub()) // Output: 2 } } -
If multiple functions exist in the highest relative scope, select the most matching function (for functions f and g with given arguments, if f can always be called when g can be called but not vice versa, then f is considered more matching than g). If no unique most matching function exists, an error is reported.
In the following example, two
gfunctions are defined in the same scope, and the more matchingg(a: Sub): Unitis selected.open class Base {} class Sub <: Base {} func outer() { func g(a: Sub) { print("1") } func g(a: Base) { print("2") } g(Sub()) // Output: 1 } -
Subclasses and parent classes are considered the same scope. In the following example, one
gfunction is defined in the parent class, and anothergfunction is defined in the subclass. When callings.g(Sub()), bothgfunctions are resolved at the same scope level, and the more matchingg(a: Sub): Unitfrom the parent class is selected.open class Base { public func g(a: Sub) { print("1") } } class Sub <: Base { public func g(a: Base) { print("2") } } func outer() { let s: Sub = Sub() s.g(Sub()) // Output: 1 }
Operator Overloading
If you want to support operators that are not natively supported by a certain type, you can achieve this through operator overloading.
To overload an operator for a type, you can define a function with the operator’s name for that type. When an instance of this type uses the operator, the corresponding operator function will be automatically called.
The definition of an operator function is similar to that of a regular function, with the following differences:
- The
operatormodifier must be added before thefunckeyword when defining an operator function. - The number of parameters in the operator function must match the requirements of the corresponding operator (see Appendix Operators for details).
- Operator functions can only be defined within
class,interface,struct,enum, andextend. - Operator functions have the semantics of instance member functions, so the
staticmodifier is prohibited. - Operator functions cannot be generic functions.
Additionally, it’s important to note that overloaded operators do not change their inherent precedence and associativity (see Appendix Operators for details).
Definition and Usage of Operator Overloading Functions
There are two ways to define operator functions:
- For types that can directly contain function definitions (including
struct,enum,class, andinterface), operator functions can be defined directly within them to overload operators. - Use the
extendapproach to add operator functions, thereby overloading operators for these types. For types that cannot directly contain function definitions (i.e., types other thanstruct,class,enum, andinterface) or types whose implementations cannot be modified (such as third-party definedstruct,class,enum, andinterface), this is the only available method (see Extensions).
The conventions for parameter types in operator functions are as follows:
-
For unary operators, the operator function takes no parameters, and there are no requirements on the return type.
-
For binary operators, the operator function takes exactly one parameter, and there are no requirements on the return type.
The following example demonstrates the definition and usage of unary and binary operators:
The
-operator negates thexandymember variables of aPointinstance and returns a newPointobject. The+operator sums thexandymember variables of twoPointinstances and returns a newPointobject.open class Point { var x: Int64 = 0 var y: Int64 = 0 public init (a: Int64, b: Int64) { x = a y = b } public operator func -(): Point { Point(-x, -y) } public operator func +(right: Point): Point { Point(this.x + right.x, this.y + right.y) } }Now, the unary
-operator and binary+operator can be used directly on instances ofPoint:main() { let p1 = Point(8, 24) let p2 = -p1 // p2 = Point(-8, -24) let p3 = p1 + p2 // p3 = Point(0, 0) } -
The index operator (
[]) has two forms: value retrieval (let a = arr[i]) and value assignment (arr[i] = a). These are distinguished by the presence or absence of a special named parametervalue. Overloading the index operator does not require overloading both forms simultaneously; you can overload only the assignment form or only the retrieval form.For the value retrieval form of the index operator
[], the parameter sequence inside the brackets corresponds to the non-named parameters of the operator overload. There can be one or more parameters of any type. No other named parameters are allowed. The return type can be any type.class A { operator func [](arg1: Int64, arg2: String): Int64 { return 0 } } func f() { let a = A() let b: Int64 = a[1, "2"] // b == 0 }For the value assignment form of the index operator
[], the parameter sequence inside the brackets corresponds to the non-named parameters of the operator overload. There can be one or more parameters of any type. The expression on the right side of=corresponds to the named parameter of the operator overload. There must be exactly one named parameter, and its name must bevalue. It cannot have a default value, andvaluecan be of any type. The return type must beUnit.Note that
valueis just a special marker; you do not need to use named parameter syntax when calling the index operator for assignment.class A { operator func [](arg1: Int64, arg2: String, value!: Int64): Unit { return } } func f() { let a = A() a[1, "2"] = 0 }Notably, immutable types (except
enum) do not support overloading the assignment form of the index operator. -
The function call operator (
()) overload function can have input parameters and return values of any type. Example:open class A { public init() {} public operator func ()(): Unit {} } func test1() { let a = A() // OK, A() is call the constructor of A a() // OK, a() is to call the operator () overloading function }You cannot use
thisorsuperto call the()operator overload function. Example:open class A { public init() {} public init(x: Int64) { this() // OK, this() calls the constructor of A } public operator func ()(): Unit {} public func foo() { this() // Error, this() calls the constructor of A. super() // Error } } class B <: A { public init() { super() // OK, super() calls the constuctor of the super class } public func goo() { super() // Error } }For enumeration types, when both the constructor form and the
()operator overload function form are applicable, the constructor form takes precedence. Example:enum E { Y | X | X(Int64) public operator func ()(p: Int64) {} public operator func ()(p: Float64) {} } main() { let e = X(1) // OK, X(1) is to call the constructor X(Int64) X(1.0) // OK, X(1.0) is to call the operator () overloading function let e1 = X e1(1) // OK, e1(1) is to call the operator () overloading function Y(1) // OK, Y(1) is to call the operator () overloading function }
Overloadable Operators
The following table lists all overloadable operators (ordered by precedence from highest to lowest):
| Operator | Description |
|---|---|
() | Function call |
[] | Indexing |
! | NOT |
- | Negative |
** | Power |
* | Multiply |
/ | Divide |
% | Remainder |
+ | Add |
- | Subtract |
<< | Bitwise left shift |
>> | Bitwise right shift |
< | Less than |
<= | Less than or equal |
> | Greater than |
>= | Greater than or equal |
== | Equal |
!= | Not equal |
& | Bitwise AND |
^ | Bitwise XOR |
| | Bitwise OR |
Important notes:
Note:
If any binary operator other than relational operators (
<,<=,>,>=,==, and!=) is overloaded for a type, and the return type of the operator function matches the type of the left operand or is a subtype of it, then the type supports the corresponding compound assignment operator. If the return type does not match the left operand’s type and is not a subtype, using the corresponding compound assignment operator will result in a type mismatch error.open class MyClass { var x: Int64 = 0 public init (a: Int64) { x = a } public operator func +(right: MyClass): Int64 { // The above rules are not met this.x + right.x } } main() { var a = MyClass(5) var b = MyClass(3) a += b; // Error, type incompatible in this compound assignment expression }The Cangjie programming language does not support custom operators. Defining operator functions other than those listed in the above table is not allowed.
For a type
T, ifTalready natively supports certain overloadable operators, attempting to redefine operator functions with the same signature via extension will result in a redefinition error. For example, overloading arithmetic operators, bitwise operators, or relational operators with the same signature for numeric types, overloading relational operators with the same signature forRune, or overloading logical operators, equality, or inequality operators with the same signature forBoolwill all trigger redefinition errors.extend Int64 { public operator func +(x: Int64, y: Int64): Int64 { // Error, invalid number of parameters for operator '+' x + y } }
const Functions and Constant Evaluation
Constant evaluation allows certain forms of expressions to be evaluated at compile time, reducing the computational load required during program execution. This chapter primarily introduces the usage methods and rules of constant evaluation.
const Context and const Expressions
A const context refers to the initialization expression of a const variable, where these expressions are always evaluated at compile time. Therefore, restrictions must be placed on the expressions allowed in const contexts to avoid side effects such as modifying global state or performing I/O operations, ensuring they can be evaluated at compile time.
A const expression possesses the capability to be evaluated at compile time. Expressions that satisfy the following rules are considered const expressions:
- Literals of numeric types,
Bool,Unit,Rune, andStringtypes (excluding interpolated strings). Arrayliterals (notArraytype, butVArraytype can be used) andtupleliterals where all elements areconstexpressions.constvariables,constfunction parameters, and local variables withinconstfunctions.constfunctions, including functions declared withconst,lambdaexpressions that meetconstfunction requirements, and function expressions returned by these functions.constfunction calls (includingconstconstructors), where the function expression must be aconstexpression and all arguments must beconstexpressions.enumconstructor calls where all arguments areconstexpressions, and parameterlessenumconstructors.- Arithmetic expressions, relational expressions, and bitwise operation expressions of numeric types,
Bool,Unit,Rune, andStringtypes, where all operands must beconstexpressions. if,match,try,throw,return,is, andasexpressions, where all internal expressions must beconstexpressions.- Member access of
constexpressions (excluding property access) and index access oftuple. const initandthisandsuperexpressions withinconstfunctions.constinstance member function calls ofconstexpressions, where all arguments must beconstexpressions.
Note:
The current compiler implementation does not support using
throwas aconstexpression.
const Functions
const functions are a special category of functions that possess the capability to be evaluated at compile time. When these functions are called in a const context, they are executed during compilation. In other non-const contexts, const functions behave like ordinary functions and are executed at runtime.
The following example demonstrates a const function that calculates the distance between two points on a plane. The distance function uses let to define two local variables, dx and dy:
struct Point {
const Point(let x: Float64, let y: Float64) {}
}
const func distance(a: Point, b: Point) {
let dx = a.x - b.x
let dy = a.y - b.y
(dx ** 2 + dy ** 2) ** 0.5
}
main() {
const a = Point(3.0, 0.0)
const b = Point(0.0, 4.0)
const d = distance(a, b)
println(d)
}
Compilation and execution output:
5.000000
Key points to note:
constfunction declarations must be marked with theconstmodifier.- Global
constfunctions andstatic constfunctions can only access externally declaredconstvariables, includingconstglobal variables andconststatic member variables. Other external variables are inaccessible.const initfunctions andconstinstance member functions can access not onlyconst-declared external variables but also instance member variables of the current type. - All expressions within
constfunctions must beconstexpressions, except forconst initfunctions. constfunctions can useletandconstto declare new local variables but do not supportvar.- There are no special restrictions on the parameter types and return types of
constfunctions. If the arguments of a function call do not meet the requirements of aconstexpression, the function call cannot be used as aconstexpression but can still be used as an ordinary expression. constfunctions are not necessarily executed at compile time; for example, they can be called at runtime within non-constfunctions.- The overloading rules for
constfunctions and non-constfunctions are consistent. - Numeric types,
Bool,Unit,Rune,Stringtypes, andenumsupport definingconstinstance member functions. - For
structandclass,constinstance member functions can only be defined ifconst initis defined.constinstance member functions inclasscannot beopen.constinstance member functions instructcannot bemut.
Additionally, interfaces can also define const functions, but they are subject to the following rules:
- For
constfunctions in an interface, the implementing type must also useconstfunctions to satisfy the interface. - For non-
constfunctions in an interface, the implementing type can use eitherconstor non-constfunctions to satisfy the interface. - Similar to
staticfunctions in interfaces,constfunctions in interfaces can only be used by generic parameters or variables constrained by the interface when the interface is used as a generic constraint.
In the following example, two const functions are defined in interface I, class A implements interface I, and the generic function g has a type parameter upper-bounded by I.
interface I {
const func f(): Int64
const static func f2(): Int64
}
class A <: I {
public const func f() { 0 }
public const static func f2() { 1 }
const init() {}
}
const func g<T>(i: T) where T <: I {
return i.f() + T.f2()
}
main() {
println(g(A()))
}
Compiling and executing the above code outputs:
1
const init
If a struct or class defines a const constructor, then instances of that struct/class can be used in const expressions.
-
If the current type is a
class, it cannot have instance member variables declared withvar; otherwise, definingconst initis not allowed. If the current type has a superclass, theconst initmust call the superclass’sconst init(either explicitly or implicitly by calling a parameterlessconst init). If the superclass does not have aconst init, an error is raised.public class Foo { val a: Int64 = 9 // Error, expected declaration, found 'val' let b: String const init(b: String) { this.b = b } }open public class Boo { let boo: String const init(b: String) { this.boo = b } } public class Foo <: Boo { let c: String const init(c: String) { //Error, there is no non-parameter constructor in super class, please invoke super call explicitly this.c = c } } -
If the instance member variables of the current type have initial values, those initial values must be
constexpressions; otherwise, definingconst initis not allowed.var a = "4123" class Foo { let foo: String = a // Error, expected 'const' expression guaranteed to be evaluated at compile time const init() {} } -
Within
const init, assignment expressions (=) can be used to assign values to instance member variables, but no other assignment expressions (such as+=,-=) are allowed.var a = "4123" class Foo { let foo: String let boo: Int64 const init() { foo = "aa" // OK boo += 10 // Error, variable 'boo' is used before initialization } }
The difference between const init and const functions is that const init allows assignment to instance member variables (using assignment expressions).
Defining struct Types
The definition of a struct type begins with the keyword struct, followed by the name of the struct, and then the struct body enclosed in a pair of curly braces. The struct body can define a series of member variables, member properties (see Properties), static initializers, constructors, and member functions.
struct Rectangle {
let width: Int64
let height: Int64
public init(width: Int64, height: Int64) {
this.width = width
this.height = height
}
public func area() {
width * height
}
}
The above example defines a struct type named Rectangle, which has two member variables of type Int64: width and height, a constructor with two Int64 parameters (defined using the keyword init, where the function body typically initializes member variables), and a member function area (which returns the product of width and height).
Note:
structcan only be defined at the top-level scope of a source file.
struct Member Variables
struct member variables are divided into instance member variables and static member variables (modified with the static modifier). The difference in access is that instance member variables can only be accessed through struct instances (saying a is an instance of type T means a is a value of type T), while static member variables can only be accessed through the struct type name.
Instance member variables can be defined without an initial value (but the type must be annotated, as in the width and height in the example above), or they can be assigned an initial value, for example:
struct Rectangle {
let width = 10
let height = 20
}
struct Static Initializers
struct supports defining static initializers, which initialize static member variables through assignment expressions within the static initializer.
A static initializer begins with the keyword combination static init, followed by a parameterless parameter list and a function body, and cannot be modified by access modifiers. The function body must complete the initialization of all uninitialized static member variables; otherwise, a compilation error will occur.
struct Rectangle {
static let degree: Int64
static init() {
degree = 180
}
}
A struct can define at most one static initializer; otherwise, a redefinition error will occur.
struct Rectangle {
static let degree: Int64
static init() {
degree = 180
}
static init() { // Error, redefinition with the previous static init function
degree = 180
}
}
struct Constructors
struct supports two types of constructors: ordinary constructors and primary constructors.
An ordinary constructor begins with the keyword init, followed by a parameter list and a function body. The function body must complete the initialization of all uninitialized instance member variables (if parameter names and member variable names cannot be distinguished, the member variable can be prefixed with this for clarification, where this represents the current instance of the struct); otherwise, a compilation error will occur.
struct Rectangle {
let width: Int64
let height: Int64
public init(width: Int64, height: Int64) { // Error, 'height' is not initialized in the constructor
this.width = width
}
}
A struct can define multiple ordinary constructors, but they must constitute overloads (see Function Overloading); otherwise, a redefinition error will occur.
struct Rectangle {
let width: Int64
let height: Int64
public init(width: Int64) {
this.width = width
this.height = width
}
public init(width: Int64, height: Int64) { // OK: overloading with the first init function
this.width = width
this.height = height
}
public init(height: Int64) { // Error, redefinition with the first init function
this.width = height
this.height = height
}
}
In addition to defining ordinary constructors named init, a struct can also define (at most) one primary constructor. The primary constructor has the same name as the struct type, and its parameter list can include two types of parameters: ordinary parameters and member variable parameters (prefixed with let or var). Member variable parameters serve the dual purpose of defining member variables and acting as constructor parameters.
Using a primary constructor can often simplify the definition of a struct. For example, the above Rectangle with an init constructor can be simplified as follows:
struct Rectangle {
public Rectangle(let width: Int64, let height: Int64) {}
}
The primary constructor’s parameter list can also include ordinary parameters, for example:
struct Rectangle {
public Rectangle(name: String, let width: Int64, let height: Int64) {}
}
If a struct definition does not include any custom constructors (including primary constructors) and all instance member variables have initial values, an automatic parameterless constructor will be generated (calling this constructor creates an object where all instance member variables are initialized to their default values); otherwise, this parameterless constructor will not be generated. For example, for the following struct definition, the automatically generated parameterless constructor is shown in the comments:
struct Rectangle {
let width: Int64 = 10
let height: Int64 = 10
/* Auto-generated memberwise constructor:
public init() {
}
*/
}
struct Member Functions
struct member functions are divided into instance member functions and static member functions (modified with the static modifier). The differences are:
- Instance member functions can only be accessed through
structinstances, while static member functions can only be accessed through thestructtype name. - Static member functions cannot access instance member variables or call instance member functions, but instance member functions can access static member variables and call static member functions.
In the following example, area is an instance member function, and typeName is a static member function.
struct Rectangle {
let width: Int64 = 10
let height: Int64 = 20
public func area() {
this.width * this.height
}
public static func typeName(): String {
"Rectangle"
}
}
Instance member functions can access instance member variables via this, for example:
struct Rectangle {
let width: Int64 = 1
let height: Int64 = 1
public func area() {
this.width * this.height
}
}
Access Modifiers for struct Members
struct members (including member variables, member properties, constructors, member functions, and operator functions (see Operator Overloading)) can be modified with four access modifiers: private, internal, protected, and public. The default modifier is internal.
private: Visible only within thestructdefinition.internal: Visible only within the current package and sub-packages (including sub-packages of sub-packages, see Packages).protected: Visible within the current module (see Packages).public: Visible both inside and outside the module.
In the following example, width is a public member and can be accessed outside the class, height has the default access modifier and is only visible within the current package and sub-packages (other packages cannot access it).
package a
public struct Rectangle {
public var width: Int64
var height: Int64
private var area: Int64
public init(width: Int64, height: Int64, area: Int64) {
this.width = width
this.height = height
this.area = area
}
}
func samePkgFunc() {
var r = Rectangle(10, 20, 40)
r.width = 8 // OK: public 'width' can be accessed here
r.height = 24 // OK: 'height' has no modifier and can be accessed here
r.area = 30 // Error, private 'area' can't be accessed here
}
package b
import a.*
main() {
r.width = 8 // OK: public 'width' can be accessed here
r.height = 24 // Error, no modifier 'height' can't be accessed here
r.area = 30 // Error, private 'area' can't be accessed here
}
Prohibiting Recursive structs
Recursive and mutually recursive struct definitions are illegal. For example:
struct R1 { // Error, 'R1' recursively references itself
let other: R1
}
struct R2 { // Error, 'R2' and 'R3' are mutually recursive
let other: R3
}
struct R3 { // Error, 'R2' and 'R3' are mutually recursive
let other: R2
}
Creating struct Instances
After defining a struct type, you can create struct instances by calling the struct’s constructor. Outside the struct definition, you create an instance of the type by calling the constructor with the struct type name, and you can access instance member variables and instance member functions that satisfy visibility modifiers (such as public) through the instance. For example, in the following code, a variable r of type Rectangle is defined. You can access the values of width and height in r through r.width and r.height, respectively, and call the member function area of r through r.area().
struct Rectangle {
public var width: Int64
public var height: Int64
public init(width: Int64, height: Int64) {
this.width = width
this.height = height
}
public func area() {
width * height
}
}
let r = Rectangle(10, 20)
let width = r.width // width = 10
let height = r.height // height = 20
let a = r.area() // a = 200
If you want to modify the values of member variables through a struct instance, you need to define the struct variable as mutable, and the member variable being modified must also be mutable (defined with var). Here’s an example:
struct Rectangle {
public var width: Int64
public var height: Int64
public init(width: Int64, height: Int64) {
this.width = width
this.height = height
}
public func area() {
width * height
}
}
main() {
var r = Rectangle(10, 20) // r.width = 10, r.height = 20
r.width = 8 // r.width = 8
r.height = 24 // r.height = 24
let a = r.area() // a = 192
}
During assignment or parameter passing, a struct instance is copied (for reference-type member variables, only the reference is copied, not the referenced object), generating a new instance. Modifying one instance does not affect the other. For example, in the following code, after assigning r1 to r2, modifying the width and height values of r1 does not affect the width and height values of r2.
struct Rectangle {
public var width: Int64
public var height: Int64
public init(width: Int64, height: Int64) {
this.width = width
this.height = height
}
public func area() {
width * height
}
}
main() {
var r1 = Rectangle(10, 20) // r1.width = 10, r1.height = 20
var r2 = r1 // r2.width = 10, r2.height = 20
r1.width = 8 // r1.width = 8
r1.height = 24 // r1.height = 24
let a1 = r1.area() // a1 = 192
let a2 = r2.area() // a2 = 200
}
mut Functions
The struct type is a value type, and its instance member functions cannot modify the instance itself. For example, in the following case, the member function g cannot modify the value of member variable i.
struct Foo {
var i = 0
public func g() {
i += 1 // Error, the value of a instance member variable cannot be modified in an instance member function
}
}
A mut function is a special instance member function that can modify the struct instance itself. Inside a mut function, the semantics of this are special—this this has the capability to modify fields in-place.
Note:
mutfunctions can only be defined within interfaces, structs, and struct extensions (classes are reference types, and their instance member functions can modify instance member variables without requiringmut, so definingmutfunctions in classes is prohibited).
Definition of mut Functions
Compared to ordinary instance member functions, mut functions are distinguished by an additional mut keyword modifier.
For example, in the following case, adding the mut modifier before function g allows modification of member variable i within the function body.
struct Foo {
var i = 0
public mut func g() {
i += 1 // OK
}
}
mut can only modify instance member functions and cannot modify static member functions.
struct A {
public mut func f(): Unit {} // OK
public mut operator func +(rhs: A): A { // OK
A()
}
public mut static func g(): Unit {} // Error, static member functions cannot be modified with 'mut'
}
In mut functions, this cannot be captured or used as an expression. Lambdas or nested functions within mut functions cannot capture instance member variables of the struct.
Example:
struct Foo {
var i = 0
public mut func f(): Foo {
let f1 = { => this } // Error, 'this' in mut functions cannot be captured
let f2 = { => this.i = 2 } // Error, instance member variables in mut functions cannot be captured
let f3 = { => this.i } // Error, instance member variables in mut functions cannot be captured
let f4 = { => i } // Error, instance member variables in mut functions cannot be captured
this // Error, 'this' in mut functions cannot be used as expressions
}
}
mut Functions in Interfaces
Instance member functions in interfaces can also be modified with mut.
When a struct type implements functions from an interface, it must maintain the same mut modifier. Types other than struct cannot use the mut modifier when implementing interface functions.
Example:
interface I {
mut func f1(): Unit
func f2(): Unit
}
struct A <: I {
public mut func f1(): Unit {} // OK: as in the interface, the 'mut' modifier is used
public func f2(): Unit {} // OK: as in the interface, the 'mut' modifier is not used
}
struct B <: I {
public func f1(): Unit {} // Error, 'f1' is modified with 'mut' in interface, but not in struct
public mut func f2(): Unit {} // Error, 'f2' is not modified with 'mut' in interface, but did in struct
}
class C <: I {
public func f1(): Unit {} // OK
public func f2(): Unit {} // OK
}
When a struct instance is assigned to an interface type, it follows copy semantics. Therefore, the mut function of the interface cannot modify the value of the struct instance.
Example:
interface I {
mut func f(): Unit
}
struct Foo <: I {
public var v = 0
public mut func f(): Unit {
v += 1
}
}
main() {
var a = Foo()
var b: I = a
b.f() // Calling 'f' via 'b' cannot modify the value of 'a'
println(a.v) // 0
}
The program output is:
0
Usage Restrictions of mut Functions
Because struct is a value type, if a variable is of struct type and declared with let, the mut functions of that type cannot be accessed via this variable.
Example:
interface I {
mut func f(): Unit
}
struct Foo <: I {
public var i = 0
public mut func f(): Unit {
i += 1
}
}
main() {
let a = Foo()
a.f() // Error, 'a' is of type struct and is declared with 'let', the 'mut' function cannot be accessed via 'a'
var b = Foo()
b.f() // OK
let c: I = Foo()
c.f() // OK, variable 'c' is of interface type I, not struct type, so access is permitted here.
}
To prevent escape, if a variable is of struct type, it cannot use mut functions as first-class citizens; it can only call these mut functions.
Example:
interface I {
mut func f(): Unit
}
struct Foo <: I {
var i = 0
public mut func f(): Unit {
i += 1
}
}
main() {
var a = Foo()
var fn = a.f // Error, mut function 'f' of 'a' cannot be used as a first class citizen.
var b: I = Foo()
fn = b.f // OK
}
To prevent escape, non-mut instance member functions (including lambda expressions) cannot directly access mut functions of their containing type, but the reverse is allowed.
Example:
struct Foo {
var i = 0
public mut func f(): Unit {
i += 1
g() // OK
}
public func g(): Unit {
f() // Error, mut functions cannot be invoked in non-mut functions
}
}
interface I {
mut func f(): Unit {
g() // OK
}
func g(): Unit {
f() // Error, mut functions cannot be invoked in non-mut functions
}
}
Enum Types
This section introduces the enum type in Cangjie. The enum type provides a way to define a type by enumerating all its possible values.
Many programming languages have enum types (or enumerated types), but their usage and expressive power vary across languages. In Cangjie, the enum type can be understood as algebraic data types (Algebraic Data Types) in functional programming languages.
Definition of enum
When defining an enum, all its possible values must be explicitly listed. These values are called constructors (or constructors) of the enum.
An enum type definition starts with the keyword enum, followed by the name of the enum, and then the enum body enclosed in curly braces. The enum body defines several constructors, separated by | (the | before the first constructor is optional). Constructors can be named or unnamed (...).
Each enum must contain at least one named constructor. Named constructors can be parameterless (i.e., “no-argument constructors”) or carry several parameters (i.e., “parameterized constructors”). The following example defines an enum type named RGBColor with three constructors: Red, Green, and Blue, representing the red, green, and blue components in the RGB color model. Each constructor has a UInt8 parameter representing the brightness level of each color.
enum RGBColor {
| Red(UInt8) | Green(UInt8) | Blue(UInt8)
}
Cangjie supports defining multiple constructors with the same name in the same enum, but these constructors must have different numbers of parameters (a parameterless constructor is considered to have 0 parameters). For example:
enum RGBColor {
| Red | Green | Blue
| Red(UInt8) | Green(UInt8) | Blue(UInt8)
}
Each enum can have at most one unnamed ... constructor, and ... must be the last constructor. An enum with a ... constructor is called a non-exhaustive enum. Since it has no name, this constructor cannot be directly matched. During destructuring, patterns that match all constructors must be used, such as the wildcard pattern _ or binding patterns. For details, refer to Definition of match expressions. For example:
enum T {
| Red | Green | Blue | ...
}
enum supports recursive definitions. For example, the following example uses enum to define an expression type (Expr), which can only have three forms: a single number Num (with an Int64 parameter), an addition expression Add (with two Expr parameters), or a subtraction expression Sub (with two Expr parameters). For the Add and Sub constructors, their parameters recursively reference Expr itself.
enum Expr {
| Num(Int64)
| Add(Expr, Expr)
| Sub(Expr, Expr)
}
Additionally, the enum body can define a series of member functions, operator functions (see Operator Overloading), and member properties (see Properties). However, constructors, member functions, and member properties must not share the same name. For example, the following example defines a function named printType in RGBColor, which outputs the string RGBColor:
enum RGBColor {
| Red | Green | Blue
public static func printType() {
print("RGBColor")
}
}
Note:
enumcan only be defined at the top-level scope of a source file.
Usage of enum
After defining an enum type, you can create instances of this type (i.e., enum values). An enum value can only take one of the constructors defined in the enum type. enum does not have constructors; you can create an enum value using TypeName.Constructor or directly using the constructor (for parameterized constructors, arguments must be provided).
In the following example, RGBColor defines three constructors: two parameterless constructors (Red and Green) and one parameterized constructor (Blue(UInt8)). The main function defines three RGBColor variables r, g, and b. r is initialized with RGBColor.Red, g is initialized directly with Green, and b is initialized with Blue(100):
enum RGBColor {
| Red | Green | Blue(UInt8)
}
main() {
let r = RGBColor.Red
let g = Green
let b = Blue(100)
}
When the type name is omitted, the name of an enum constructor may conflict with type names, variable names, or function names. In such cases, the enum type name must be prefixed to use the constructor; otherwise, the system will choose the definition with the same name (type, variable, or function).
In the following example, only the constructor Blue(UInt8) can be used without the type name. Red and Green(UInt8) cannot be used directly due to name conflicts and must be prefixed with the type name RGBColor.
let Red = 1
func Green(g: UInt8) {
return g
}
enum RGBColor {
| Red | Green(UInt8) | Blue(UInt8)
}
let r1 = Red // Will choose 'let Red'
let r2 = RGBColor.Red // OK: constructed by enum type name
let g1 = Green(100) // Will choose 'func Green'
let g2 = RGBColor.Green(100) // OK: constructed by enum type name
let b = Blue(100) // OK: can be uniquely identified as an enum constructor
In the following example, only the constructor Blue cannot be used directly due to a name conflict and must be prefixed with the type name RGBColor.
class Blue {}
enum RGBColor {
| Red | Green(UInt8) | Blue(UInt8)
}
let r = Red // OK: constructed by enum type name
let g = Green(100) // OK: constructed by enum type name
let b = Blue(100) // Will choose constructor of 'class Blue' and report an error
The Option Type
The Option type is defined using an enum with two constructors: Some and None. The Some variant carries a parameter indicating a value is present, while None takes no parameters and represents the absence of a value. The Option type is used when you need to represent that a value of a certain type may or may not exist.
The Option type is defined as a generic enum as follows (note that the angle brackets contain a type parameter T. When T is instantiated with different types, different Option types are produced. For detailed information about generics, refer to Generics):
enum Option<T> {
| Some(T)
| None
}
Here, the parameter type of the Some constructor is the type parameter T. When T is instantiated with different types, different Option types are obtained, such as Option<Int64>, Option<String>, etc.
There is also a shorthand notation for the Option type: prefixing the type name with ?. That is, for any type Ty, ?Ty is equivalent to Option<Ty>. For example, ?Int64 is equivalent to Option<Int64>, ?String is equivalent to Option<String>, and so on.
The following examples demonstrate how to define variables of the Option type:
let a: Option<Int64> = Some(100)
let b: ?Int64 = Some(100)
let c: Option<String> = Some("Hello")
let d: ?String = None
Additionally, although T and Option<T> are different types, when it is explicitly known that an Option<T> value is required in a certain context, you can directly pass a value of type T. The compiler will automatically wrap the T value into an Option<T> using the Some constructor (note: this is not a type conversion). For example, the following definitions are legal (equivalent to the definitions of variables a, b, and c in the previous example):
let a: Option<Int64> = 100
let b: ?Int64 = 100
let c: Option<String> = "100"
When there is no explicit type requirement in the context, you cannot use None directly to construct the desired type. In such cases, you should use the None<T> syntax to construct an Option<T> value, for example:
let a = None<Int64> // a: Option<Int64>
let b = None<Bool> // b: Option<Bool>
Finally, for usage of Option, refer to Using Option.
Pattern Overview
For match expressions containing matching values, the patterns supported after case determine the expressive power of the match expression. This section introduces the patterns supported by Cangjie in sequence, including: constant patterns, wildcard patterns, binding patterns, tuple patterns, type patterns, and enum patterns.
Constant Pattern
A constant pattern can be an integer literal, floating-point literal, character literal, boolean literal, string literal (string interpolation is not supported), or Unit literal.
When using a constant pattern in a match expression containing matching values (refer to match expression), the type of the value represented by the constant pattern must be the same as the type of the value to be matched. The matching succeeds if the value to be matched is equal to the value represented by the constant pattern.
In the following example, based on the value of score (assuming score can only take values between 0 and 100 divisible by 10), the grade of the exam score is output:
main() {
let score = 90
let level = match (score) {
case 0 | 10 | 20 | 30 | 40 | 50 => "D"
case 60 => "C"
case 70 | 80 => "B"
case 90 | 100 => "A" // Matched.
case _ => "Not a valid score"
}
println(level)
}
Compiling and executing the above code outputs:
A
-
When the target of pattern matching is a value with static type
Rune, bothRuneliterals and single-character string literals can be used to represent constant patterns ofRunetype literals.func translate(n: Rune) { match (n) { case "A" => 1 case "B" => 2 case "C" => 3 case _ => -1 } } main() { println(translate(r"C")) }Compiling and executing the above code outputs:
3 -
When the target of pattern matching is a value with static type
Byte, a string literal representing an ASCII character can be used to represent constant patterns ofBytetype literals.func translate(n: Byte) { match (n) { case "1" => 1 case "2" => 2 case "3" => 3 case _ => -1 } } main() { println(translate(51)) // UInt32(r'3') == 51 }Compiling and executing the above code outputs:
3
Wildcard Pattern
The wildcard pattern is represented by an underscore _ and can match any value. The wildcard pattern is typically used as the pattern in the last case to cover situations not matched by other cases. For example, in the constant pattern example matching score values, the last case uses _ to match invalid score values.
Binding Pattern
The binding pattern is represented by id, where id is a valid identifier. Compared to the wildcard pattern, the binding pattern can also match any value, but it binds the matched value to id, allowing access to the bound value via id after =>.
In the following example, the last case uses a binding pattern. Here, the variable n is an id identifier used to bind non-0 values:
main() {
let x = -10
let y = match (x) {
case 0 => "zero"
case n => "x is not zero and x = ${n}" // Matched.
}
println(y)
}
Compiling and executing the above code outputs:
x is not zero and x = -10
When using | to connect multiple patterns, binding patterns cannot be used, nor can they be nested within other patterns, otherwise an error will occur:
main() {
let opt = Some(0)
match (opt) {
case x | x => {} // Error, variable cannot be introduced in patterns connected by '|'
case Some(x) | Some(x) => {} // Error, variable cannot be introduced in patterns connected by '|'
case x: Int64 | x: String => {} // Error, variable cannot be introduced in patterns connected by '|'
}
}
The binding pattern id is equivalent to defining a new immutable variable named id (its scope starts from the introduction point to the end of the case), so id cannot be modified after =>. For example, modifying n in the last case of the following example is not allowed.
main() {
let x = -10
let y = match (x) {
case 0 => "zero"
case n => n = n + 0 // Error, 'n' cannot be modified.
"x is not zero"
}
println(y)
}
For each case branch, the variable scope level after => is the same as the variable scope level introduced before => in the case. Introducing the same name again after => will trigger a redefinition error. For example:
main() {
let x = -10
let y = match (x) {
case 0 => "zero"
case n => let n = 0 // Error, redefinition
println(n)
"x is not zero"
}
println(y)
}
Note:
When the identifier of a pattern is an enum constructor, the pattern will be treated as an enum pattern for matching, not a binding pattern (for details on enum patterns, see the enum pattern section).
enum RGBColor {
| Red | Green | Blue
}
main() {
let x = Red
let y = match (x) {
case Red => "red" // The 'Red' is enum mode here.
case _ => "not red"
}
println(y)
}
Compiling and executing the above code outputs:
red
Tuple Pattern
The tuple pattern is used to match tuple values. Its definition is similar to tuple literals: (p_1, p_2, ..., p_n), where p_1 to p_n (n ≥ 2) are patterns (which can be any pattern introduced in this section, with multiple patterns separated by commas) rather than expressions.
For example, (1, 2, 3) is a tuple pattern containing three constant patterns, and (x, y, _) is a tuple pattern containing two binding patterns and one wildcard pattern.
Given a tuple value tv and a tuple pattern tp, tp matches tv if and only if each position value in tv matches the corresponding position pattern in tp. For example, (1, 2, 3) can only match the tuple value (1, 2, 3), while (x, y, _) can match any triple tuple value.
The following example demonstrates the use of tuple patterns:
main() {
let tv = ("Alice", 24)
let s = match (tv) {
case ("Bob", age) => "Bob is ${age} years old"
case ("Alice", age) => "Alice is ${age} years old" // Matched, "Alice" is a constant pattern, and 'age' is a variable pattern.
case (name, 100) => "${name} is 100 years old"
case (_, _) => "someone"
}
println(s)
}
Compiling and executing the above code outputs:
Alice is 24 years old
The same tuple pattern cannot introduce multiple binding patterns with the same name. For example, case (x, x) in the last case of the following example is invalid.
main() {
let tv = ("Alice", 24)
let s = match (tv) {
case ("Bob", age) => "Bob is ${age} years old"
case ("Alice", age) => "Alice is ${age} years old"
case (name, 100) => "${name} is 100 years old"
case (x, x) => "someone" // Error, Cannot introduce a variable pattern with the same name, which will be a redefinition error.
}
println(s)
}
Type Pattern
The type pattern is used to determine whether the runtime type of a value is a subtype of a certain type. There are two forms of type patterns: _: Type (nesting a wildcard pattern _) and id: Type (nesting a binding pattern id). The difference is that the latter performs variable binding, while the former does not.
For a value v to be matched and a type pattern id: Type (or _: Type), first determine whether the runtime type of v is a subtype of Type. If true, the match is successful; otherwise, it fails. If the match succeeds, the type of v is converted to Type and bound to id (for _: Type, there is no binding operation).
Assume the following two classes, Base and Derived, where Derived is a subclass of Base. The parameterless constructor of Base sets the value of a to 10, and the parameterless constructor of Derived sets the value of a to 20:
open class Base {
var a: Int64
public init() {
a = 10
}
}
class Derived <: Base {
public init() {
a = 20
}
}
The following code demonstrates a successful type pattern match:
func test1() {
var d = Derived()
var r = match (d) {
case b: Base => b.a // Matched.
case _ => 0
}
println("r = ${r}")
}
The following code demonstrates a failed type pattern match:
func test2() {
var b = Base()
var r = match (b) {
case d: Derived => d.a // Type pattern match failed.
case _ => 0 // Matched.
}
println("r = ${r}")
}
main() {
test1()
test2()
}
Compiling and executing the above code yields the following output (the first line is the output of test1, and the second line is the output of test2):
r = 20
r = 0
Enum Pattern
The enum pattern is used to match instances of enum types. Its definition resembles the constructor of an enum: a no-argument constructor C or a parameterized constructor C(p_1, p_2, ..., p_n). The type prefix of the constructor can be omitted. The difference lies in the fact that p_1 to p_n (where n ≥ 1) here are patterns. For example, Some(1) is an enum pattern containing a constant pattern, while Some(x) is an enum pattern containing a binding pattern.
Given an enum instance ev and an enum pattern ep, ep is said to match ev if and only if the constructor name of ev is the same as that of ep, and each value in the parameter list of ev matches the corresponding pattern in ep. For example, Some("one") can only match the Some constructor of type Option<String>, i.e., Option<String>.Some("one"), while Some(x) can match any Some constructor of the Option type.
In the following example, the use of an enum pattern is demonstrated. Since the constructor of x is Year, it will match the first case:
enum TimeUnit {
| Year(UInt64)
| Month(UInt64)
}
main() {
let x = Year(2)
let s = match (x) {
case Year(n) => "x has ${n * 12} months" // Matched.
case TimeUnit.Month(n) => "x has ${n} months"
}
println(s)
}
Compiling and executing the above code yields the following output:
x has 24 months
When multiple enum patterns are connected using |, each pattern must be independent and cannot introduce new variables. This is because | represents an “or” relationship, and variable introduction requires explicit context, which cannot be shared across multiple patterns. The following example demonstrates a counterexample where the fifth and sixth case statements violate this rule:
enum TimeUnit {
| Year(UInt64)
| Month(UInt64)
}
main() {
let x = Year(2)
let s = match (x) {
case Year(5) => "1:OK"
case Month(m) => "2:OK"
case Year(0) | Year(1) | Month(_) => "3:OK"
case Year(_) => "4:OK"
case Year(2) | Month(m) => "5:invalid" // Error, Variable cannot be introduced in patterns connected by '|'
case Year(n: UInt64) | Month(n: UInt64) => "6:invalid" // Error, Variable cannot be introduced in patterns connected by '|'
}
println(s)
}
In the above example, the second case introduces a new variable m but does not connect it with other patterns using |, making it valid.
When using a match expression to match enum values, the patterns following case must cover all constructors of the enum type being matched. If not fully covered, the compiler will report an error:
enum RGBColor {
| Red | Green | Blue
}
main() {
let c = Green
let cs = match (c) { // Error, Not all constructors of RGBColor are covered.
case Red => "Red"
case Green => "Green"
}
println(cs)
}
Full coverage can be achieved by adding case Blue or by using case _ at the end of the match expression to cover cases not handled by other case statements, as shown below:
enum RGBColor {
| Red | Green | Blue
}
main() {
let c = Blue
let cs = match (c) {
case Red => "Red"
case Green => "Green"
case _ => "Other" // Matched.
}
println(cs)
}
The execution result of the above code is:
Other
Nested Combination of Patterns
Tuple patterns and enum patterns can nest arbitrary patterns. The following code demonstrates the nested combination of different patterns:
enum TimeUnit {
| Year(UInt64)
| Month(UInt64)
}
enum Command {
| SetTimeUnit(TimeUnit)
| GetTimeUnit
| Quit
}
main() {
let command = (SetTimeUnit(Year(2022)), SetTimeUnit(Year(2024)))
match (command) {
case (SetTimeUnit(Year(year)), _) => println("Set year ${year}")
case (_, SetTimeUnit(Month(month))) => println("Set month ${month}")
case _ => ()
}
}
Compiling and executing the above code yields the following output:
Set year 2022
Pattern Refutability
Patterns can be divided into two categories: refutable patterns and irrefutable patterns. Under the premise of type matching, when a pattern may fail to match the value being matched, it is called a refutable pattern; conversely, when a pattern can always match the value being matched, it is called an irrefutable pattern.
For the various patterns introduced above, the following rules apply:
Constant patterns are refutable patterns. For example, in the following example, both 1 in the first case and 2 in the second case may not equal the value of x.
func constPat(x: Int64) {
match (x) {
case 1 => "one"
case 2 => "two"
case _ => "_"
}
}
Wildcard patterns are irrefutable patterns. For example, in the following example, no matter what the value of x is, _ will always match it.
func wildcardPat(x: Int64) {
match (x) {
case _ => "_"
}
}
Binding patterns are irrefutable patterns. For example, in the following example, no matter what the value of x is, the binding pattern a will always match it.
func varPat(x: Int64) {
match (x) {
case a => "x = ${a}"
}
}
Tuple patterns are irrefutable patterns if and only if every pattern they contain is an irrefutable pattern. For example, in the following example, both (1, 2) and (a, 2) may fail to match the value of x, so they are refutable patterns, whereas (a, b) can match any value of x, so it is an irrefutable pattern.
func tuplePat(x: (Int64, Int64)) {
match (x) {
case (1, 2) => "(1, 2)"
case (a, 2) => "(${a}, 2)"
case (a, b) => "(${a}, ${b})"
}
}
Type patterns are refutable patterns. For example, in the following example (assuming Base is the parent class of Derived, and Base implements the interface I), the runtime type of x may be neither Base nor Derived, so both a: Derived and b: Base are refutable patterns.
interface I {}
open class Base <: I {}
class Derived <: Base {}
func typePat(x: I) {
match (x) {
case a: Derived => "Derived"
case b: Base => "Base"
case _ => "Other"
}
}
Enum patterns are irrefutable patterns if and only if the corresponding enum type has only one parameterized constructor, and the other patterns contained in the enum pattern are also irrefutable patterns. For example, for the definitions of E1 and E2 in the following example, A(1) in the function enumPat1 is a refutable pattern, while A(a) is an irrefutable pattern; whereas in the function enumPat2, both B(b) and C(c) are refutable patterns.
enum E1 {
A(Int64)
}
enum E2 {
B(Int64) | C(Int64)
}
func enumPat1(x: E1) {
match (x) {
case A(1) => "A(1)"
case A(a) => "A(${a})"
}
}
func enumPat2(x: E2) {
match (x) {
case B(b) => "B(${b})"
case C(c) => "C(${c})"
}
}
Match Expressions
Definition of Match Expressions
Cangjie supports two types of match expressions: the first is a match expression containing a value to be matched, and the second is a match expression without a value to be matched.
Match Expression with Matching Value:
main() {
let x = 0
match (x) {
case 1 => let r1 = "x = 1"
print(r1)
case 0 => let r2 = "x = 0" // Matched.
print(r2)
case _ => let r3 = "x != 1 and x != 0"
print(r3)
}
}
The match expression starts with the keyword match, followed by the value to be matched, such as x in the example above. x can be any expression. This is followed by several case branches enclosed in curly braces. Each case branch starts with the keyword case, followed by a pattern or multiple patterns of the same kind connected by |. In the example above, 1, 0, and _ are all patterns. For details, see the Pattern Overview chapter. After the pattern comes =>, followed by the operation to be executed when the case branch matches successfully. This can be a series of expressions, variable definitions, or function definitions. The scope of newly defined variables or functions starts from their definition point and ends before the next case. For example, the variable definitions and print function calls in the example above.
In the example above, since the value of x is equal to 0, it matches the second case branch (here, a constant pattern is used, which matches whether the values are equal. For details, see the Constant Pattern chapter). Finally, it outputs x = 0.
Compile and execute the above code, and the output result is:
x = 0
The match expression requires that all matches must be exhaustive, meaning all possible values of the expression to be matched should be considered. When the match expression is not exhaustive, or the compiler cannot determine whether it is exhaustive, a compilation error will occur. In other words, the union of the value ranges covered by all case branches (including pattern guards) should include all possible values of the expression to be matched. A common way to ensure the exhaustiveness of the match expression is to use the wildcard pattern _ in the last case branch, as _ can match any value.
The exhaustiveness of the match expression ensures that there must be a case branch that matches the value to be matched. The following example will result in a compilation error because not all possible values of x are covered by the case branches:
func nonExhaustive(x: Int64) {
match (x) {
case 0 => print("x = 0")
case 1 => print("x = 1")
case 2 => print("x = 2")
}
}
If the type of the matched value includes an enum type and the enum is a non-exhaustive enum, then when matching, a pattern that can match all constructors must be used, such as the wildcard pattern _ or a binding pattern.
enum T {
| Red | Green | Blue | ...
}
func foo(a: T) {
match (a) {
case Red => 0
case Green => 1
case Blue => 2
case _ => -1
}
}
func bar(a: T) {
match (a) {
case Red => 0
case k => -1 // simple binding pattern
}
}
func baz(a: T) {
match (a) {
case Red => 0
case k: T => -1 // binding pattern with nested type pattern
}
}
After the pattern in the case branch, a pattern guard can be used to further judge the matching result. This is optional.
pattern guardrepresents an additional condition that must be satisfied after thecasematches successfully. It is expressed usingwhere cond(syntax format), requiring the type of the expressioncondto beBool.
When the match expression is executed, the expression after match is sequentially matched with each pattern in the case. If there is a pattern guard, the expression after where must evaluate to true; if there are multiple patterns in the case connected by |, as long as the value to be matched matches one of the patterns, it is considered a successful match. Once a match is successful, the code after => is executed, and then the execution of the match expression is exited, meaning that subsequent case branches will not be matched. If the match is unsuccessful, it will continue to match with the patterns in subsequent case branches until a match is successful. The match expression guarantees that there must be a case branch that matches.
In the following example, an enum pattern is used. For details, see the Enum Pattern chapter. When the parameter value of the RGBColor constructor is greater than or equal to 0, their values are output; when the parameter value is less than 0, meaning the where cond of the first case is satisfied, their values are considered to be 0:
enum RGBColor {
| Red(Int16) | Green(Int16) | Blue(Int16)
}
main() {
let c = RGBColor.Green(-100)
let cs = match (c) {
case Red(r) where r < 0 => "Red = 0"
case Red(r) => "Red = ${r}"
case Green(g) where g < 0 => "Green = 0" // Matched.
case Green(g) => "Green = ${g}"
case Blue(b) where b < 0 => "Blue = 0"
case Blue(b) => "Blue = ${b}"
}
print(cs)
}
Compile and execute the above code, and the output result is:
Green = 0
Match Expression Without Matching Value:
main() {
let x = -1
match {
case x > 0 => print("x > 0")
case x < 0 => print("x < 0") // Matched.
case _ => print("x = 0")
}
}
Compared to the match expression with a value to be matched, there is no expression to be matched after the keyword match, and what follows case is no longer a pattern, but an expression of type Bool (such as x > 0 and x < 0 in the above code) or _ (representing true). Of course, there is no pattern guard in the case either.
When the match expression without a matching value is executed, the expressions after case are evaluated in sequence until a case branch with an expression value of true is encountered. Once the expression value after a case equals true, the code after => in this case is executed, and then the execution of the match expression is exited (meaning that subsequent case branches will not be evaluated).
In the example above, since the value of x is -1, the expression in the second case branch (i.e., x < 0) evaluates to true, and print("x < 0") is executed.
Compile and execute the above code, and the output result is:
x < 0
Types of Match Expressions
For match expressions (whether with or without a matching value):
-
When there is a clear type requirement in the context, the type of the code block after
=>in eachcasebranch must be a subtype of the type required by the context; -
When there is no clear type requirement in the context, the type of the
matchexpression is the least common parent type of the types of the code blocks after=>in eachcasebranch; -
When the value of the
matchexpression is not used, its type isUnit, and there is no requirement for the least common parent type of the types of each branch.
The following examples illustrate these points.
let x = 2
let s: String = match (x) {
case 0 => "x = 0"
case 1 => "x = 1"
case _ => "x != 0 and x != 1" // Matched.
}
In the example above, when defining the variable s, its type is explicitly annotated as String, which is a case where the context type information is clear. Therefore, the type of the code block after => in each case must be a subtype of String. Clearly, the string literals after => in the example meet this requirement.
Here is another example without context type information:
let x = 2
let s = match (x) {
case 0 => "x = 0"
case 1 => "x = 1"
case _ => "x != 0 and x != 1" // Matched.
}
In the example above, when defining the variable s, its type is not explicitly annotated. Since the type of the code block after => in each case is String, the type of the match expression is String, and thus the type of s can be determined to be String.
Other Usage Scenarios of Patterns
Patterns can be used not only in match expressions but also in variable definitions and for in expressions. For example, the left side of an equals sign is a pattern, and the part between the for keyword and the in keyword is also a pattern. Additionally, conditions in if expressions and while expressions can utilize patterns. For specific examples, refer to the “Conditions Involving let-pattern” section.
However, not all patterns can be used in variable definitions and for in expressions. Only irrefutable patterns are permitted in these contexts. Therefore, only wildcard patterns, binding patterns, irrefutable tuple patterns, and irrefutable enum patterns are allowed.
-
Examples of using wildcard patterns in variable definitions and
for inexpressions:main() { let _ = 100 for (_ in 1..5) { println("0") } }In the above example, a wildcard pattern is used in the variable definition, indicating the creation of a nameless variable (which consequently cannot be accessed later). The
for inexpression uses a wildcard pattern, meaning elements from1..5won’t be bound to any variable (thus their values cannot be accessed within the loop body). Compiling and executing this code yields:0 0 0 0 -
Examples of using binding patterns in variable definitions and
for inexpressions:main() { let x = 100 println("x = ${x}") for (i in 1..5) { println(i) } }Here,
xin the variable definition andiin thefor inexpression are both binding patterns. Compiling and executing this code yields:x = 100 1 2 3 4 -
Examples of using
irrefutabletuple patterns in variable definitions andfor inexpressions:main() { let (x, y) = (100, 200) println("x = ${x}") println("y = ${y}") for ((i, j) in [(1, 2), (3, 4), (5, 6)]) { println("Sum = ${i + j}") } }In this example, a tuple pattern is used in the variable definition to destructure
(100, 200)and bind its components toxandy, effectively defining two variables. Thefor inexpression employs a tuple pattern to sequentially extract tuple-type elements from[(1, 2), (3, 4), (5, 6)], destructure them, and bind their components toiandj, then output their sum in the loop body. Compiling and executing this code yields:x = 100 y = 200 Sum = 3 Sum = 7 Sum = 11 -
Examples of using
irrefutableenum patterns in variable definitions andfor inexpressions:enum RedColor { Red(Int64) } main() { let Red(red) = Red(0) println("red = ${red}") for (Red(r) in [Red(10), Red(20), Red(30)]) { println("r = ${r}") } }Here, an enum pattern is used in the variable definition to destructure
Red(0)and bind its constructor parameter (i.e.,0) tored. Thefor inexpression uses an enum pattern to sequentially extract elements from[Red(10), Red(20), Red(30)], destructure them, and bind their constructor parameters tor, then outputrin the loop body. Compiling and executing this code yields:red = 0 r = 10 r = 20 r = 30
Classes
The class type is a classic concept in object-oriented programming. Cangjie also supports using class to implement object-oriented programming. The main differences between class and struct are: class is a reference type while struct is a value type, and they behave differently during assignment or parameter passing; class types can inherit from each other, but struct types cannot.
This section sequentially introduces how to define class types, how to create objects, and class inheritance.
Class Definition
A class type definition starts with the keyword class, followed by the class name, and then the class body enclosed in curly braces. The class body can define a series of member variables, member properties (see Properties), static initializers, constructors, member functions, and operator functions (details in Operator Overloading).
class Rectangle {
let width: Int64
let height: Int64
public init(width: Int64, height: Int64) {
this.width = width
this.height = height
}
public func area() {
width * height
}
}
The above example defines a class type named Rectangle, which has two Int64 member variables width and height, a constructor with two Int64 parameters, and a member function area (returning the product of width and height).
Note:
classcan only be defined at the top-level scope of a source file.
A class modified with abstract is an abstract class. Unlike regular classes, abstract classes can declare abstract functions (without function bodies) in addition to defining normal functions. The open modifier in abstract class definitions is optional, and the sealed modifier can also be used. The sealed modifier indicates that the abstract class can only be inherited within the same package (see Class Inheritance). The following example defines an abstract function foo in the abstract class AbRectangle.
abstract class AbRectangle {
public func foo(): Unit
}
Note:
- Abstract classes cannot define
privateabstract functions;- Instances of abstract classes cannot be created;
- Non-abstract subclasses of abstract classes must implement all abstract functions from the parent class.
Class Member Variables
Class member variables are divided into instance member variables and static member variables. Static member variables are modified with the static modifier, must have initial values if no static initializer is present, and can only be accessed via the type name, as shown in the following example:
class Rectangle {
let width = 10
static let height = 20
}
let l = Rectangle.height // l = 20
Instance member variables can be defined without initial values (but must have type annotations) or with initial values, and can only be accessed via objects (i.e., instances of the class), as shown in the following example:
class Rectangle {
let width = 10
let height: Int64
init(h: Int64) {
height = h
}
}
let rec = Rectangle(20)
let l = rec.height // l = 20
Class Static Initializer
Classes support defining static initializers, where static member variables can be initialized via assignment expressions.
A static initializer starts with the keyword combination static init, followed by a parameterless parameter list and a function body, and cannot be modified with access modifiers. The function body must initialize all uninitialized static member variables; otherwise, a compilation error occurs.
class Rectangle {
static let degree: Int64
static init() {
degree = 180
}
}
A class can define at most one static initializer; otherwise, a redefinition error occurs.
class Rectangle {
static let degree: Int64
static init() {
degree = 180
}
static init() { // Error, redefinition with the previous static init function
degree = 180
}
}
Class Constructors
Like struct, class also supports defining regular constructors and primary constructors.
A regular constructor starts with the keyword init, followed by a parameter list and a function body. The function body must initialize all uninitialized instance member variables; otherwise, a compilation error occurs.
class Rectangle {
let width: Int64
let height: Int64
public init(width: Int64, height: Int64) { // Error, 'height' is not initialized in the constructor
this.width = width
}
}
A class can define multiple regular constructors, but they must constitute overloads (see Function Overloading); otherwise, a redefinition error occurs.
class Rectangle {
let width: Int64
let height: Int64
public init(width: Int64) {
this.width = width
this.height = width
}
public init(width: Int64, height: Int64) { // OK: overloading with the first init function
this.width = width
this.height = height
}
public init(height: Int64) { // Error, redefinition with the first init function
this.width = height
this.height = height
}
}
In addition to defining regular init constructors, a class can define (at most) one primary constructor. The primary constructor has the same name as the class type, and its parameter list can include two types of parameters: regular parameters and member variable parameters (prefixed with let or var). Member variable parameters serve the dual purpose of defining member variables and constructor parameters.
Using a primary constructor can often simplify class definitions. For example, the above Rectangle with an init constructor can be simplified as follows:
class Rectangle {
public Rectangle(let width: Int64, let height: Int64) {}
}
The primary constructor’s parameter list can also include regular parameters, for example:
class Rectangle {
public Rectangle(name: String, let width: Int64, let height: Int64) {}
}
When creating an instance of a class, the constructor is called, and the following sequence of expressions in the class is executed:
- First, initialize variables defined outside the primary constructor that have default values;
- If the constructor body does not explicitly call a parent class constructor or another constructor of the same class, the parent class’s parameterless constructor
super()is called. If the parent class has no parameterless constructor, an error occurs; - Execute the code in the constructor body.
func foo(x: Int64): Int64 {
println("I'm foo, got ${x}")
x
}
open class A {
init() {
println("I'm A")
}
}
class B <: A {
var x = foo(0)
init() {
x = foo(1)
println("init B finished")
}
}
main() {
B()
0
}
In the above example, when calling B’s constructor, the variable x with a default value is initialized first, causing foo(0) to be called; then the parent class’s parameterless constructor is called, causing A’s constructor to be invoked; finally, the code in the constructor body is executed, causing foo(1) to be called and a string to be printed. Thus, the output of the example is:
I'm foo, got 0
I'm A
I'm foo, got 1
init B finished
If a class definition contains no custom constructors (including primary constructors) and all instance member variables have initial values, a parameterless constructor is automatically generated (calling this constructor creates an object where all instance member variables have their initial values); otherwise, this parameterless constructor is not generated. For example, the following class definition will have an auto-generated parameterless constructor:
class Rectangle {
let width = 10
let height = 20
/* Auto-generated parameterless constructor:
public init() {
}
*/
}
// Invoke the auto-generated parameterless constructor
let r = Rectangle() // r.width = 10,r.height = 20
Class Finalizer
Classes support defining finalizers, which are triggered when an instance of the class is garbage-collected. The finalizer’s function name is fixed as ~init and is typically used to release system resources. The following example uses unsafe (details in Unsafe Section):
class C {
var p: CString
init(s: String) {
p = unsafe { LibC.mallocCString(s) }
println(s)
}
~init() {
unsafe { LibC.free(p) }
}
}
There are some restrictions on using finalizers that developers should note:
-
Finalizers have no parameters, no return type, no generic type parameters, no modifiers, and cannot be explicitly called.
-
Classes with finalizers cannot be modified with
open; only non-openclasses can have finalizers. -
A class can define at most one finalizer.
-
Finalizers cannot be defined in extensions.
-
The timing of finalizer execution is indeterminate.
-
Finalizers may execute on any thread.
-
The execution order of multiple finalizers is indeterminate.
-
Throwing uncaught exceptions from finalizers is undefined behavior.
-
Creating threads or using thread synchronization in finalizers is undefined behavior.
-
If an object remains accessible after its finalizer executes, this is undefined behavior.
-
If an object throws an exception during initialization, the finalizer for the incompletely initialized object will not execute.
-
Relying on finalizers for synchronization is undefined behavior. For example, in the following example, the
mainfunction waits for the finalizer in theTestclass to modify the value oft0viawhile (Test.t0 != 0), which is undefined behavior.import std.collection.ArrayList import std.runtime.gc class Test { public static var t0 : Int32 = 0 public init () { t0++ } ~init () { t0-- } } var list: ArrayList<Test> = ArrayList<Test>() func foo() : Int32 { let o1 = Test() list.add(o1) if (Test.t0 != 1) { return 1 } list.remove(at: 0) return 0 } main(): Int64 { var i : Int64 = 0 while (i < 100) { if (foo() != 0) { print("fail: obj is freed before gc!") return 1 } gc(heavy: true) // blocking gc expected // wait ~init() to be executed while (Test.t0 != 0) { // error, this is undefined behavior continue } i++ } return 0 }
Class Member Functions
Class member functions are similarly divided into instance member functions and static member functions (modified with the static modifier). Instance member functions can only be accessed through objects, while static member functions can only be accessed through the class type name. Static member functions cannot access instance member variables or call instance member functions, but instance member functions can access static member variables and call static member functions.
In the following example, area is an instance member function, and typeName is a static member function.
class Rectangle {
let width: Int64 = 10
let height: Int64 = 20
public func area() {
this.width * this.height
}
public static func typeName(): String {
"Rectangle"
}
}
Instance member functions can be further categorized into abstract member functions and non-abstract member functions based on whether they have a function body. Abstract member functions lack a function body and can only be defined in abstract classes or interfaces (see Interfaces for details). Note that abstract instance member functions inherently have open semantics, where the open modifier is optional and must be used with either public or protected modifiers.
Non-abstract functions must have a function body. Within the function body, instance member variables can be accessed via this. For example:
class Rectangle {
let width: Int64 = 10
let height: Int64 = 20
public func area() {
this.width * this.height
}
}
Access Modifiers for Class Members
For class members (including member variables, member properties, constructors, and member functions), four access modifiers can be used: private, internal, protected, and public. The default is internal.
private: Visible only within the class definition.internal: Visible only within the current package and its sub-packages (including sub-packages of sub-packages; see Packages).protected: Visible within the current module (see Packages) and to subclasses of the current class.public: Visible both inside and outside the module.
package a
public open class Rectangle {
public var width: Int64
protected var height: Int64
private var area: Int64
public init(width: Int64, height: Int64) {
this.width = width
this.height = height
this.area = this.width * this.height
}
init(width: Int64, height: Int64, multiple: Int64) {
this.width = width
this.height = height
this.area = width * height * multiple
}
}
func samePkgFunc() {
var r = Rectangle(10, 20) // OK: constructor 'Rectangle' can be accessed here
r.width = 8 // OK: public 'width' can be accessed here
r.height = 24 // OK: protected 'height' can be accessed here
r.area = 30 // Error, private 'area' cannot be accessed here
}
package b
import a.*
public class Cuboid <: Rectangle {
private var length: Int64
public init(width: Int64, height: Int64, length: Int64) {
super(width, height)
this.length = length
}
public func volume() {
this.width * this.height * this.length // OK: protected 'height' can be accessed here
}
}
main() {
var r = Rectangle(10, 20, 2) // Error, Rectangle has no `public` constructor with three parameters
var c = Cuboid(20, 20, 20)
c.width = 8 // OK: public 'width' can be accessed here
c.height = 24 // Error, protected 'height' cannot be accessed here
c.area = 30 // Error, private 'area' cannot be accessed here
}
The This Type
Within a class, the This type placeholder is supported, representing the current class type. It can only be used as the return type of instance member functions. When a subclass object calls a function defined in the parent class that returns a This type, the type of the function call is recognized as the subclass type, not the parent class type where it was defined.
If an instance member function does not declare a return type and only contains expressions returning This, the function’s return type is inferred as This. Example:
open class C1 {
func f(): This { // its type is `() -> C1`
return this
}
func f2() { // its type is `() -> C1`
return this
}
public open func f3(): C1 {
return this
}
}
class C2 <: C1 {
// member function f is inherited from C1, and its type is `() -> C2` now
public override func f3(): This { // OK
return this
}
}
main() {
var obj1: C2 = C2()
var obj2: C1 = C2()
var x = obj1.f() // During compilation, the type of x is C2
var y = obj2.f() // During compilation, the type of y is C1
}
Creating Objects
After defining a class type, objects can be created by calling its constructor (via the class type name). For example, in the following code, Rectangle(10, 20) creates an object of type Rectangle and assigns it to variable r. After creation, (public-modified) instance member variables and instance member functions can be accessed through the object. For example, r.width and r.height access the values of width and height in r, respectively, and r.area() calls the member function area.
class Rectangle {
let width: Int64
let height: Int64
public init(width: Int64, height: Int64) {
this.width = width
this.height = height
}
public func area() {
this.width * this.height
}
}
main() {
let r = Rectangle(10, 20) // r.width = 10, r.height = 20
let width = r.width // width = 10
let height = r.height // height = 20
let a = r.area() // a = 200
}
If you wish to modify member variable values through objects (not recommended; it’s better to modify them via member functions), the member variables in the class must be defined as mutable (using var). Example:
class Rectangle {
public var width: Int64
public var height: Int64
public init(width: Int64, height: Int64) {
this.width = width
this.height = height
}
public func area() {
width * height
}
}
main() {
let r = Rectangle(10, 20) // r.width = 10, r.height = 20
r.width = 8 // r.width = 8
r.height = 24 // r.height = 24
let a = r.area() // a = 192
}
Unlike struct, when objects are assigned or passed as parameters, they are not copied. Multiple variables point to the same object, so modifying a member through one variable affects the corresponding member in other variables. For example, in the following code, after assigning r1 to r2, modifying r1.width and r1.height also changes r2.width and r2.height.
class Rectangle {
var width: Int64
var height: Int64
public init(width: Int64, height: Int64) {
this.width = width
this.height = height
}
public func area() {
this.width * this.height
}
}
main() {
var r1 = Rectangle(10, 20) // r1.width = 10, r1.height = 20
var r2 = r1 // r2.width = 10, r2.height = 20
r1.width = 8 // r1.width = 8
r1.height = 24 // r1.height = 24
let a1 = r1.area() // a1 = 192
let a2 = r2.area() // a2 = 192
}
Class Inheritance
Like most programming languages that support class, Cangjie’s class also supports inheritance. If class B inherits from class A, A is called the parent class, and B is called the child class. The child class inherits all members of the parent class except private members and constructors.
Abstract classes are always inheritable, so the open modifier for abstract class definitions is optional. Alternatively, an abstract class can be modified with sealed, indicating it can only be inherited within its package. However, non-abstract classes can only be inherited if they are defined with the open modifier. When an open-modified instance member is inherited by a class, the open modifier is also inherited. If a non-open class contains open members, the compiler will issue a warning.
The parent class can be specified in the child class definition using <:, but the parent class must be inheritable. For example, in the following code, class A is modified with open, so it can be inherited by class B. However, since class B is not inheritable, C will report an error when attempting to inherit from B.
open class A {
let a: Int64 = 10
}
class B <: A { // OK: 'B' Inheritance 'A'
let b: Int64 = 20
}
class C <: B { // Error, 'B' is not inheritable
let c: Int64 = 30
}
class supports only single inheritance, so the following code attempting to inherit from two classes is invalid (& is syntax for implementing multiple interfaces; see Interfaces).
open class A {
let a: Int64 = 10
}
open class B {
let b: Int64 = 20
}
class C <: A & B { // Error, 'C' can only inherit one class
let c: Int64 = 30
}
Because classes support only single inheritance, any class can have at most one direct parent class. For classes defined with a parent class, the direct parent is the specified class. For classes defined without a parent class, the direct parent is the Object type. Object is the parent of all classes (note: Object has no direct parent and contains no members).
Because child classes inherit from parent classes, child class objects can naturally be used as parent class objects, but the reverse is not true. For example, in the following code, B is a child of A, so a B-type object can be assigned to an A-type variable, but an A-type object cannot be assigned to a B-type variable.
open class A {
let a: Int64 = 10
}
class B <: A {
let b: Int64 = 20
}
let a: A = B() // OK: subclass objects can be assigned to superclass variables
open class A {
let a: Int64 = 10
}
class B <: A {
let b: Int64 = 20
}
let b: B = A() // Error, superclass objects can not be assigned to subclass variables
The class defined type does not allow inheriting from itself.
class A <: A {} // Error, 'A' inherits itself
Abstract classes can use the sealed modifier, indicating that the modified class definition can only be inherited by other classes within the same package where the definition resides. The sealed modifier already implies public/open semantics. Therefore, when defining a sealed abstract class, if public/open modifiers are provided, the compiler will issue a warning. Subclasses of sealed classes do not have to be sealed themselves and can still be modified with open/sealed or use no inheritance modifiers at all. If a subclass of a sealed class is modified with open, then its subclasses can be inherited outside the package. Subclasses of sealed classes do not need to be modified with public.
package A
public sealed abstract class C1 {} // Warning, redundant modifier, 'sealed' implies 'public'
sealed open abstract class C2 {} // Warning, redundant modifier, 'sealed' implies 'open'
sealed abstract class C3 {} // OK, 'public' is optional when 'sealed' is used
class S1 <: C1 {} // OK
public open class S2 <: C1 {} // OK
public sealed abstract class S3 <: C1 {} // OK
open class S4 <: C1 {} // OK
package B
import A.*
class SS1 <: S2 {} // OK
class SS2 <: S3 {} // Error, S3 is sealed class, cannot be inherited here
sealed class SS3 {} // Error, 'sealed' cannot be used on non-abstract class
Superclass Constructor Invocation
The init constructor of a subclass can call the superclass constructor using the form super(args), or call another constructor of the same class using this(args), but only one of them can be called. If called, it must be the first expression in the constructor body, with no preceding expressions or declarations allowed.
open class A {
A(let a: Int64) {}
}
class B <: A {
let b: Int64
init(b: Int64) {
super(30)
this.b = b
}
init() {
this(20)
}
}
In the primary constructor of a subclass, the superclass constructor can be called using super(args), but other constructors of the same class cannot be called using this(args).
If a subclass constructor does not explicitly call a superclass constructor or another constructor, the compiler will insert a call to the parameterless constructor of the direct superclass at the beginning of the constructor body. If the superclass does not have a parameterless constructor, a compilation error will occur.
open class A {
let a: Int64
init() {
a = 100
}
}
open class B <: A {
let b: Int64
init(b: Int64) {
// OK, `super()` added by compiler
this.b = b
}
}
open class C <: B {
let c: Int64
init(c: Int64) { // Error, there is no non-parameter constructor in super class
this.c = c
}
}
Overriding and Redefinition
In a subclass, non-abstract instance member functions with the same name as those in the parent class can be overridden, meaning new implementations can be defined for these functions in the subclass. When overriding, the member function in the parent class must be modified with open, and the function in the subclass must be modified with override, where override is optional. For example, in the following example, the function f in subclass B overrides the function f in parent class A.
open class A {
public open func f(): Unit {
println("I am superclass")
}
}
class B <: A {
public override func f(): Unit {
println("I am subclass")
}
}
main() {
let a: A = A()
let b: A = B()
a.f()
b.f()
}
For overridden functions, the version called is determined by the runtime type of the variable (determined by the actual object assigned to the variable), known as dynamic dispatch. For example, in the above example, the runtime type of a is A, so a.f() calls the function f in parent class A; the runtime type of b is B (compile-time type is A), so b.f() calls the function f in subclass B. Therefore, the program will output:
I am superclass
I am subclass
For static functions, non-abstract static functions with the same name as those in the parent class can be redefined in the subclass, meaning new implementations can be defined for these functions in the subclass. When redefining, the static function in the subclass must be modified with redef, where redef is optional. For example, in the following example, the function foo in subclass D redefines the function foo in parent class C.
open class C {
public static func foo(): Unit {
println("I am class C")
}
}
class D <: C {
public redef static func foo(): Unit {
println("I am class D")
}
}
main() {
C.foo()
D.foo()
}
For redefined functions, the version called is determined by the type of the class. For example, in the above example, C.foo() calls the function foo in parent class C, and D.foo() calls the function foo in subclass D.
I am class C
I am class D
If an abstract function or a function modified with open has named parameters, the implementing function or the function modified with override must maintain the same named parameters.
open class A {
public open func f(a!: Int32): Int32 {
a + 1
}
}
class B <: A {
public override func f(a!: Int32): Int32 { // OK
a + 2
}
}
class C <: A {
public override func f(b!: Int32): Int32 { // Error
b + 3
}
}
main() {
B().f(a: 0)
C().f(b: 0)
}
It is also important to note that when implementing or redefining a generic function, the type parameter constraints of the subtype function must be looser or the same as those of the corresponding function in the parent type.
open class A {}
open class B <: A {}
open class C <: B {}
open class Base {
public open func foo<T>(a: T): Unit where T <: B {}
public open func bar<T>(a: T): Unit where T <: B {}
public static func f<T>(a: T): Unit where T <: B {}
public static func g<T>(): Unit where T <: B {}
}
class D <: Base {
public override func foo<T>(a: T): Unit where T <: C {} // Error, stricter constraint
public override func bar<T>(a: T): Unit where T <: C {} // Error, stricter constraint
public redef static func f<T>(a: T): Unit where T <: C {} // Error, stricter constraint
public redef static func g<T>(): Unit where T <: C {} // Error, stricter constraint
}
class E <: Base {
public override func foo<T>(a: T): Unit where T <: A {} // OK: looser constraint
public override func bar<V>(a: V): Unit where V <: A {} // OK: looser constraint, names of generic parameters do not matter
public redef static func f<T>(a: T): Unit where T <: A {} // OK: looser constraint
public redef static func g<T>(): Unit where T <: A {} // OK: looser constraint
}
class F <: Base {
public override func foo<T>(a: T): Unit where T <: B {} // OK: same constraint
public override func bar<V>(a: V): Unit where V <: B {} // OK: same constraint
public redef static func f<T>(a: T): Unit where T <: B {} // OK: same constraint
public redef static func g<T>(): Unit where T <: B {} // OK: same constraint
}
Interface
An interface is used to define an abstract type that contains no data but can specify the behavior of a type. A type is said to implement an interface if it declares to implement that interface and provides implementations for all its members.
Interface members may include:
- Member functions
- Operator overload functions
- Member properties
All these members are abstract, requiring the implementing type to provide corresponding implementations.
Interface Definition
A simple interface definition is as follows:
interface I { // 'open' modifier is optional.
func f(): Unit
}
Interfaces are declared using the interface keyword, followed by the interface identifier I and its members. Interface members can be modified with the optional open modifier.
When interface I declares a member function f, any type implementing I must provide a corresponding implementation of f.
Since interface inherently has open semantics, the open modifier in interface definitions is optional.
As shown in the following code, a class Foo is defined using the syntax Foo <: I to declare that Foo implements interface I.
Foo must contain implementations for all members declared in I, meaning it needs to define an f of the same type; otherwise, a compilation error will occur due to unimplemented interface requirements.
class Foo <: I {
public func f(): Unit {
println("Foo")
}
}
main() {
let a = Foo()
let b: I = a
b.f() // "Foo"
}
When a type implements an interface, it becomes a subtype of that interface.
In the above example, Foo is a subtype of I, so any instance of Foo can be used as an instance of I.
In main, a Foo-type variable a is assigned to an I-type variable b. When calling function f on b, the f implementation from Foo is executed, printing:
Foo
An interface can also be modified with sealed to indicate that it can only be inherited, implemented, or extended within the package where the interface is defined. sealed inherently implies public/open semantics, so defining a sealed interface with additional public/open modifiers will trigger compiler warnings. Child interfaces inheriting from a sealed interface or abstract classes implementing it may still be marked sealed or left unmodified. If a child interface of a sealed interface is marked public but not sealed, it can be inherited, implemented, or extended outside the package. Types inheriting or implementing a sealed interface need not be marked public.
package A
public interface I1 {}
sealed interface I2 {} // OK
public sealed interface I3 {} // Warning, redundant modifier, 'sealed' implies 'public'
sealed open interface I4 {} // Warning, redundant modifier, 'sealed' implies 'open'
class C1 <: I1 {}
public open class C2 <: I1 {}
sealed abstract class C3 <: I2 {}
extend Int64 <: I2 {}
package B
import A.*
class S1 <: I1 {} // OK
class S2 <: I2 {} // Error, I2 is sealed interface, cannot be inherited here.
Through this constraint mechanism, interfaces can define common functionalities for a series of types, achieving the purpose of functional abstraction.
For example, the following code defines a Flyable interface and has other classes with flying capabilities implement it.
interface Flyable {
func fly(): Unit
}
class Bird <: Flyable {
public func fly(): Unit {
println("Bird flying")
}
}
class Bat <: Flyable {
public func fly(): Unit {
println("Bat flying")
}
}
class Airplane <: Flyable {
public func fly(): Unit {
println("Airplane flying")
}
}
func fly(item: Flyable): Unit {
item.fly()
}
main() {
let bird = Bird()
let bat = Bat()
let airplane = Airplane()
fly(bird)
fly(bat)
fly(airplane)
}
Compiling and executing the above code yields the following output:
Bird flying
Bat flying
Airplane flying
Interface members can be either instance or static. The previous examples demonstrated instance member functions; now let’s examine static member functions.
Static member functions, like instance member functions, require implementing types to provide implementations.
For example, the following code defines a NamedType interface containing a static member function typename to retrieve the string name of each type.
Other types implementing NamedType must provide an implementation of typename, enabling safe retrieval of type names for all subtypes of NamedType.
interface NamedType {
static func typename(): String
}
class A <: NamedType {
public static func typename(): String {
"A"
}
}
class B <: NamedType {
public static func typename(): String {
"B"
}
}
main() {
println("the type is ${ A.typename() }")
println("the type is ${ B.typename() }")
}
The program outputs:
the type is A
the type is B
Static member functions (or properties) in interfaces may either have no default implementation or provide one.
When no default implementation exists, the member cannot be accessed via the interface type name. For example, the following code triggers a compilation error when attempting to access typename directly through NamedType, because NamedType lacks an implementation of typename.
interface NamedType {
static func typename(): String
}
main() {
NamedType.typename() // Error
}
Static member functions (or properties) in interfaces can also have default implementations. When a type inherits from an interface with default static function (or property) implementations, that type may omit reimplementing the static member, which can then be accessed directly through either the interface name or the type name. In the following example, NamedType’s typename member function has a default implementation, which A doesn’t need to reimplement, while still allowing direct access via both the interface and type names.
interface NamedType {
static func typename(): String {
"interface NamedType"
}
}
class A <: NamedType {}
main() {
println(NamedType.typename())
println(A.typename())
}
The program outputs:
interface NamedType
interface NamedType
Such static members are typically used in generic functions through generic constraints.
For example, the printTypeName function below constrains the generic parameter T to be a subtype of NamedType, ensuring that all static member functions (or properties) in the instantiated type of T have implementations, enabling access via T.typename. This achieves abstraction of static members. See Generics for details.
interface NamedType {
static func typename(): String
}
interface I <: NamedType {
static func typename(): String {
f()
}
static func f(): String
}
class A <: NamedType {
public static func typename(): String {
"A"
}
}
class B <: NamedType {
public static func typename(): String {
"B"
}
}
func printTypeName<T>() where T <: NamedType {
println("the type is ${ T.typename() }")
}
main() {
printTypeName<A>() // OK
printTypeName<B>() // OK
printTypeName<I>() // Error, 'I' must implement all static function. Otherwise, an unimplemented 'f' is called, causing problems.
}
Interfaces can define generic instance member functions or generic static member functions, which, like non-generic functions, have open semantics.
import std.collection.ArrayList
interface M {
func foo<T>(a: T): T
static func toString<T>(b: ArrayList<T>): String where T <: ToString
}
class C <: M {
public func foo<S>(a: S): S { // implements M::foo, names of generic parameters do not matter
a
}
public static func toString<T>(b: ArrayList<T>) where T <: ToString {
var res = ""
for (s in b) {
res += s.toString()
}
res
}
}
Note that interface members are inherently public and cannot be declared with additional access control modifiers. Implementing types must also use public implementations.
interface I {
func f(): Unit
}
open class C <: I {
protected func f() {} // Compiler Error, f needs to be public semantics
}
```## Interface Inheritance and Interface Implementation
When implementing multiple interfaces for a type, you can use `&` to separate multiple interfaces in the declaration, with no specific order required between the implemented interfaces.
For example, the following code allows `MyInt` to implement both the `Addable` and `Subtractable` interfaces.
<!-- compile -->
```cangjie
interface Addable {
func add(other: Int64): Int64
}
interface Subtractable {
func sub(other: Int64): Int64
}
class MyInt <: Addable & Subtractable {
var value = 0
public func add(other: Int64): Int64 {
value + other
}
public func sub(other: Int64): Int64 {
value - other
}
}
An interface can inherit one or more interfaces but cannot inherit a class. Additionally, new interface members can be added during interface inheritance.
For example, the Calculable interface in the following code inherits both the Addable and Subtractable interfaces and adds overloads for multiplication and division operators.
interface Addable {
func add(other: Int64): Int64
}
interface Subtractable {
func sub(other: Int64): Int64
}
interface Calculable <: Addable & Subtractable {
func mul(other: Int64): Int64
func div(other: Int64): Int64
}
When a type implements the Calculable interface, it must implement all four operator overloads (addition, subtraction, multiplication, and division), with no members omitted.
class MyInt <: Calculable {
var value = 0
public func add(other: Int64): Int64 {
value + other
}
public func sub(other: Int64): Int64 {
value - other
}
public func mul(other: Int64): Int64 {
value * other
}
public func div(other: Int64): Int64 {
value / other
}
}
By implementing Calculable, MyInt also implements all interfaces inherited by Calculable. Thus, MyInt is also a subtype of Addable and Subtractable.
main() {
let myInt = MyInt()
let add: Addable = myInt
let sub: Subtractable = myInt
let calc: Calculable = myInt
}
For interface inheritance, if a child interface inherits a function or property with a default implementation from its parent interface, it cannot merely declare that function or property (i.e., without a default implementation). It must provide a new default implementation. The override or redef modifier before the function definition is optional. If a child interface inherits a function or property without a default implementation from its parent interface, it can either declare it or provide a default implementation, with the override or redef modifier being optional. The redef modifier specifically targets the redefinition of static functions with the same name in child classes.
interface I1 {
func f(a: Int64) {
a
}
static func g(a: Int64) {
a
}
func f1(a: Int64): Unit
static func g1(a: Int64): Unit
}
interface I2 <: I1 {
/*'override' is optional*/ func f(a: Int64) {
a + 1
}
override func f(a: Int32) {} // Error, override function 'f' does not have an overridden function from its supertypes
static /*'redef' is optional*/ func g(a: Int64) {
a + 1
}
/*'override' is optional*/ func f1(a: Int64): Unit {}
static /*'redef' is optional*/ func g1(a: Int64): Unit {}
}
Requirements for Interface Implementation
In Cangjie, all types except Tuple, VArray, and functions can implement interfaces.
A type can implement an interface in three ways:
- Declaring interface implementation during type definition, as shown in the examples above.
- Implementing interfaces via extensions (see Extensions for details).
- Built-in language implementation (refer to the relevant documentation in the Cangjie Programming Language Library API).
When a type declares interface implementation, it must implement all required members of the interface, adhering to the following rules:
- For member functions and operator overload functions, the implementing type must provide functions with the same name, parameter list, and return type as those specified in the interface.
- For member properties, the
mutmodifier must be consistent, and the property types must match.
In most cases, as shown in the examples above, the implementing type must contain implementations matching the interface’s requirements.
However, there is one exception: if the return type of a member function or operator overload function in the interface is a class type, the implementing function’s return type can be a subtype of that class.
For example, in the following code, the return type of f in I is the class type Base, so the return type of f in C can be Sub, a subtype of Base.
open class Base {}
class Sub <: Base {}
interface I {
func f(): Base
}
class C <: I {
public func f(): Sub {
Sub()
}
}
Additionally, interface members can provide default implementations. For example, in the following code, say in SayHi has a default implementation, so A can inherit this implementation when implementing SayHi, while B can choose to provide its own implementation of say.
interface SayHi {
func say() {
"hi"
}
}
class A <: SayHi {}
class B <: SayHi {
public func say() {
"hi, B"
}
}
Notably, if a type implements multiple interfaces that contain default implementations for the same member, a multiple inheritance conflict occurs. The language cannot determine the most suitable implementation, so the default implementations become invalid, and the implementing type must provide its own implementation.
For example, in the following code, both SayHi and SayHello contain implementations of say. When Foo implements these two interfaces, it must provide its own implementation; otherwise, a compilation error will occur.
interface SayHi {
func say() {
"hi"
}
}
interface SayHello {
func say() {
"hello"
}
}
class Foo <: SayHi & SayHello {
public func say() {
"Foo"
}
}
For struct, enum, and class, the override modifier (or redef modifier) before function or property definitions is optional when implementing interfaces, regardless of whether the interface’s functions or properties have default implementations.
interface I {
func foo(): Int64 {
return 0
}
}
enum E <: I{
elem
public override func foo(): Int64 {
return 1
}
}
struct S <: I {
public override func foo(): Int64 {
return 1
}
}
The Any Type
Any is a built-in interface defined as follows:
interface Any {}
In Cangjie, all interfaces implicitly inherit from Any, and all non-interface types implicitly implement it. Therefore, all types can be used as subtypes of Any.
For example, in the following code, variables of different types can be assigned to an Any-type variable.
main() {
var any: Any = 1
any = 2.0
any = "hello, world!"
}
Properties
Properties provide a getter and an optional setter to indirectly retrieve and modify values.
When using properties, they behave no differently from ordinary variables—you only need to manipulate the data without being aware of the internal implementation. This facilitates mechanisms such as access control, data monitoring, debugging, and data binding.
Properties can be used as expressions or assigned values. Here, classes and interfaces are used as examples, but properties are not limited to these.
The following is a simple example where b is a typical property that encapsulates external access to a:
class Foo {
private var a = 0
public mut prop b: Int64 {
get() {
println("get")
a
}
set(value) {
println("set")
a = value
}
}
}
main() {
var x = Foo()
let y = x.b + 1 // get
x.b = y // set
}
Here, Foo provides a property named b. For the getter/setter functionality, Cangjie provides get and set syntax for definition. When a variable x of type Foo accesses b, the get operation of b is called, returning a value of type Int64, which can then be used to add to 1. When x assigns a value to b, the set operation of b is called, passing the value of y to the value parameter of set, and ultimately assigning value to a.
Through property b, external code remains completely unaware of the member variable a in Foo, yet can achieve the same access and modification operations via b, demonstrating effective encapsulation. Thus, the program outputs:
get
set
Property Definition
Properties can be defined in interface, class, struct, enum, and extend.
A typical property syntax structure is as follows:
class Foo {
public prop a: Int64 {
get() { 0 }
}
public mut prop b: Int64 {
get() { 0 }
set(v) {}
}
}
Here, a and b declared with prop are both properties, and both are of type Int64. a is a property without the mut modifier—such properties must define only a getter (for value retrieval). b is a property with the mut modifier—such properties must define both a getter (for value retrieval) and a setter (for value assignment).
Note:
For numeric types, tuples, functions,
Bool,Unit,Nothing,String,Range, andenumtypes, mut properties cannot be declared in their extensions or declarations, nor can interfaces with mut properties be implemented.
The getter and setter of a property correspond to two distinct functions.
- The getter function type is
() -> T, whereTis the property’s type. The getter function is executed when the property is used as an expression. - The setter function type is
(T) -> Unit, whereTis the property’s type. The parameter name must be explicitly specified. The setter function is executed when the property is assigned a value.
The implementations of the getter and setter can include declarations and expressions, just like function bodies, following the same rules as function bodies. For details, refer to the Function Body section.
The parameter in the setter corresponds to the value passed during assignment.
class Foo {
private var j = 0
public mut prop i: Int64 {
get() {
j
}
set(v) {
j = v
}
}
}
Note that accessing the property itself within its getter or setter constitutes a recursive call, which, like function calls, may lead to infinite loops.
Modifiers
Modifiers can be declared before prop.
class Foo {
public prop a: Int64 {
get() {
0
}
}
private prop b: Int64 {
get() {
0
}
}
}
Like member functions, member properties also support the open, override, and redef modifiers, allowing properties to be overridden/redefined in subtypes.
When a subtype overrides a property from its parent type, if the parent property has the mut modifier, the subtype property must also have the mut modifier and maintain the same type.
As shown in the following code, A defines two properties, x and y. B can override/redefine x and y using override/redef:
open class A {
private var valueX = 0
private static var valueY = 0
public open prop x: Int64 {
get() { valueX }
}
public static mut prop y: Int64 {
get() { valueY }
set(v) {
valueY = v
}
}
}
class B <: A {
private var valueX2 = 0
private static var valueY2 = 0
public override prop x: Int64 {
get() { valueX2 }
}
public redef static mut prop y: Int64 {
get() { valueY2 }
set(v) {
valueY2 = v
}
}
}
Abstract Properties
Similar to abstract functions, abstract properties can be declared in interface and abstract classes. These abstract properties have no implementation.
interface I {
prop a: Int64
}
abstract class C {
public prop a: Int64
}
When an implementing type realizes an interface or a non-abstract subclass inherits from an abstract class, these abstract properties must be implemented.
Like the rules for overriding, if the parent property has the mut modifier, the subtype property must also have the mut modifier and maintain the same type.
interface I {
prop a: Int64
mut prop b: Int64
}
class C <: I {
private var value = 0
public prop a: Int64 {
get() { value }
}
public mut prop b: Int64 {
get() { value }
set(v) {
value = v
}
}
}
Abstract properties allow interfaces and abstract classes to specify data operations in a more user-friendly manner, making them more intuitive compared to function-based approaches.
As shown in the following code, if you want to specify the retrieval and modification of a size value, using properties (I1) results in less code and aligns better with the intent of data manipulation compared to using functions (I2).
interface I1 {
mut prop size: Int64
}
interface I2 {
func getSize(): Int64
func setSize(value: Int64): Unit
}
class C <: I1 & I2 {
private var mySize = 0
public mut prop size: Int64 {
get() {
mySize
}
set(value) {
mySize = value
}
}
public func getSize() {
mySize
}
public func setSize(value: Int64) {
mySize = value
}
}
main() {
let a: I1 = C()
a.size = 5
println(a.size)
let b: I2 = C()
b.setSize(5)
println(b.getSize())
}
5
5
Property Usage
Properties are divided into instance member properties and static member properties. The usage of member properties is the same as that of member variables. For details, refer to the Member Variables section.
class A {
public prop x: Int64 {
get() {
123
}
}
public static prop y: Int64 {
get() {
321
}
}
}
main() {
var a = A()
println(a.x) // 123
println(A.y) // 321
}
The result is:
123
321
Properties without the mut modifier are similar to variables declared with let and cannot be assigned.
class A {
private let value = 0
public prop i: Int64 {
get() {
value
}
}
}
main() {
var x = A()
println(x.i) // OK
x.i = 1 // Error
}
Properties marked with the mut modifier are similar to variables declared with var, allowing both value retrieval and assignment.
class A {
private var value: Int64 = 0
public mut prop i: Int64 {
get() {
value
}
set(v) {
value = v
}
}
}
main() {
var x = A()
println(x.i) // OK
x.i = 1 // OK
}
0
Subtype Relationships
Like other object-oriented languages, the Cangjie language provides subtype relationships and subtype polymorphism. Examples include (but are not limited to the following use cases):
- If a function’s formal parameter is of type
T, the actual type of the argument passed during the function call can be eitherTor a subtype ofT(strictly speaking, the subtypes ofTalready includeTitself, and the same applies below). - If the type of the variable on the left-hand side of an assignment expression
=isT, the actual type of the expression on the right-hand side of=can be eitherTor a subtype ofT. - If the user-annotated return type in a function definition is
T, the type of the function body (and the types of allreturnexpressions within the function body) can be eitherTor a subtype ofT.
The following sections describe several scenarios where two types form a subtype relationship.
Subtype Relationships Introduced by Class Inheritance
After inheriting from a class, the subclass becomes a subtype of the parent class. In the following code, Sub is a subtype of Super.
open class Super {}
class Sub <: Super {}
Subtype Relationships Introduced by Interface Implementation
After implementing an interface (including extension implementations), the type implementing the interface becomes a subtype of the interface. In the following code, I3 is a subtype of I1 and I2, C is a subtype of I1, and Int64 is a subtype of I2:
interface I1 {}
interface I2 {}
interface I3 <: I1 & I2 {}
class C <: I1 {}
extend Int64 <: I2 {}
Subtype Relationships of Tuple Types
Tuple types in the Cangjie language also have subtype relationships. Intuitively, if each element type of a tuple t1 is a subtype of the corresponding element type in another tuple t2, then the type of tuple t1 is also a subtype of the type of tuple t2. For example, in the following code, since C2 <: C1 and C4 <: C3, it follows that (C2, C4) <: (C1, C3) and (C4, C2) <: (C3, C1).
open class C1 {}
class C2 <: C1 {}
open class C3 {}
class C4 <: C3 {}
let t1: (C1, C3) = (C2(), C4()) // OK
let t2: (C3, C1) = (C4(), C2()) // OK
Subtype Relationships of Function Types
In the Cangjie language, functions are first-class citizens, and function types also have subtype relationships: Given two function types (U1) -> S2 and (U2) -> S1, (U1) -> S2 is a subtype of (U2) -> S1 if and only if U2 is a subtype of U1 and S2 is a subtype of S1 (note the order). For example, the following code defines two functions f : (U1) -> S2 and g : (U2) -> S1, where the type of f is a subtype of the type of g. Since the type of f is a subtype of g, f can be used wherever g is used.
open class U1 {}
class U2 <: U1 {}
open class S1 {}
class S2 <: S1 {}
func f(a: U1): S2 { S2() }
func g(a: U2): S1 { S1() }
func call1() {
g(U2()) // OK
f(U2()) // OK
}
func h(lam: (U2) -> S1): S1 {
lam(U2())
}
func call2() {
h(g) // OK
h(f) // OK
}
For the above rule, the S2 <: S1 part is easy to understand: The result data produced by a function call will be used by subsequent programs. Function g can produce result data of type S1, and function f can produce result data of type S2. The result data produced by g should be replaceable by the result data produced by f, hence the requirement that S2 <: S1.
For the U2 <: U1 part, it can be understood as follows: Before a function call produces a result, it must be callable. The actual argument type of the function call remains fixed, while the formal parameter type can be more permissive and still be callable. However, if the formal parameter type is more restrictive, it may not be callable—for example, given the definitions in the above code, g(U2()) can be replaced with f(U2()) precisely because the actual argument type U2 is more restrictive than the formal parameter type U1.
Subtype Relationships That Always Hold
In the Cangjie language, some predefined subtype relationships always hold:
- A type
Tis always a subtype of itself, i.e.,T <: T. - The
Nothingtype is always a subtype of any other typeT, i.e.,Nothing <: T. - Any type
Tis a subtype of theAnytype, i.e.,T <: Any. - Any type defined by a
classis a subtype ofObject, i.e., ifclass C {}exists, thenC <: Object.
Subtype Relationships Introduced by Transitivity
Subtype relationships are transitive. In the following code, although only I2 <: I1, C <: I2, and Bool <: I2 are described, the transitivity of subtypes implicitly establishes C <: I1 and Bool <: I1 as subtype relationships.
interface I1 {}
interface I2 <: I1 {}
class C <: I2 {}
extend Bool <: I2 {}
Subtype Relationships of Generic Types
Generic types also have subtype relationships. For details, see Subtype Relationships of Generic Types.
Type Conversion
Cangjie does not support implicit conversion between different types (subtypes are inherently parent types, so conversion from a subtype to a parent type is not implicit type conversion). Type conversion must be performed explicitly. The following sections will introduce conversions between numeric types, conversions from Rune to UInt32 and from integer types to Rune, as well as the is and as operators.
Conversions Between Numeric Types
For numeric types (including: Int8, Int16, Int32, Int64, IntNative, UInt8, UInt16, UInt32, UInt64, UIntNative, Float16, Float32, Float64), Cangjie supports using the T(e) syntax to obtain a value equal to e with type T. Here, the expression e and type T can be any of the aforementioned numeric types.
The following example demonstrates type conversion between numeric types:
main() {
let a: Int8 = 10
let b: Int16 = 20
let r1 = Int16(a)
println("The type of r1 is 'Int16', and r1 = ${r1}")
let r2 = Int8(b)
println("The type of r2 is 'Int8', and r2 = ${r2}")
let c: Float32 = 1.0
let d: Float64 = 1.123456789
let r3 = Float64(c)
println("The type of r3 is 'Float64', and r3 = ${r3}")
let r4 = Float32(d)
println("The type of r4 is 'Float32', and r4 = ${r4}")
let e: Int64 = 1024
let f: Float64 = 1024.1024
let r5 = Float64(e)
println("The type of r5 is 'Float64', and r5 = ${r5}")
let r6 = Int64(f)
println("The type of r6 is 'Int64', and r6 = ${r6}")
}
The execution result of the above code is:
The type of r1 is 'Int16', and r1 = 10
The type of r2 is 'Int8', and r2 = 20
The type of r3 is 'Float64', and r3 = 1.000000
The type of r4 is 'Float32', and r4 = 1.123457
The type of r5 is 'Float64', and r5 = 1024.000000
The type of r6 is 'Int64', and r6 = 1024
Note:
Overflow may occur during type conversion. If the overflow can be detected by the compiler in advance, the compiler will directly report an error. Otherwise, an exception will be thrown according to the default overflow policy.
Conversion from Rune to UInt32 and from Integer Types to Rune
Conversion from Rune to UInt32 uses the UInt32(e) syntax, where e is a Rune-type expression. The result of UInt32(e) is the UInt32-type integer value corresponding to the Unicode scalar value of e.
Conversion from integer types to Rune uses the Rune(num) syntax, where num can be of any integer type. Only when the value of num falls within [0x0000, 0xD7FF] or [0xE000, 0x10FFFF] (i.e., Unicode scalar value range), the corresponding character represented by the Unicode scalar value is returned. Otherwise, a compilation error will occur (if the value of num can be determined at compile time) or an exception will be thrown at runtime.
The following example demonstrates type conversion between Rune and UInt32:
main() {
let x: Rune = 'a'
let y: UInt32 = 65
let r1 = UInt32(x)
let r2 = Rune(y)
println("The type of r1 is 'UInt32', and r1 = ${r1}")
println("The type of r2 is 'Rune', and r2 = ${r2}")
}
The execution result of the above code is:
The type of r1 is 'UInt32', and r1 = 97
The type of r2 is 'Rune', and r2 = A
The is and as Operators
Cangjie supports using the is operator to determine whether the type of an expression is the specified type (or its subtype). Specifically, for the expression e is T (where e can be any expression and T can be any type), when the runtime type of e is a subtype of T, the value of e is T is true; otherwise, it is false.
The following example demonstrates the use of the is operator:
open class Base {
var name: String = "Alice"
}
class Derived <: Base {
var age: UInt8 = 18
}
main() {
let a = 1 is Int64
println("Is the type of 1 'Int64'? ${a}")
let b = 1 is String
println("Is the type of 1 'String'? ${b}")
let b1: Base = Base()
let b2: Base = Derived()
var x = b1 is Base
println("Is the type of b1 'Base'? ${x}")
x = b1 is Derived
println("Is the type of b1 'Derived'? ${x}")
x = b2 is Base
println("Is the type of b2 'Base'? ${x}")
x = b2 is Derived
println("Is the type of b2 'Derived'? ${x}")
}
The execution result of the above code is:
Is the type of 1 'Int64'? true
Is the type of 1 'String'? false
Is the type of b1 'Base'? true
Is the type of b1 'Derived'? false
Is the type of b2 'Base'? true
Is the type of b2 'Derived'? true
The as operator can be used to convert the type of an expression to the specified type. Since type conversion may fail, the as operator returns an Option type. Specifically, for the expression e as T (where e can be any expression and T can be any type), when the runtime type of e is a subtype of T, the value of e as T is Option<T>.Some(e); otherwise, it is Option<T>.None.
The following example demonstrates the use of the as operator (comments indicate the results of the as operation):
open class Base {
var name: String = "Alice"
}
class Derived <: Base {
var age: UInt8 = 18
}
let a = 1 as Int64 // a = Option<Int64>.Some(1)
let b = 1 as String // b = Option<String>.None
let b1: Base = Base()
let b2: Base = Derived()
let d: Derived = Derived()
let r1 = b1 as Base // r1 = Option<Base>.Some(b1)
let r2 = b1 as Derived // r2 = Option<Derived>.None
let r3 = b2 as Base // r3 = Option<Base>.Some(b2)
let r4 = b2 as Derived // r4 = Option<Derived>.Some(b2)
let r5 = d as Base // r5 = Option<Base>.Some(d)
let r6 = d as Derived // r6 = Option<Derived>.Some(d)
Generics Overview
In the Cangjie programming language, generics refer to parameterized types, where a parameterized type is one that is unknown at declaration time and must be specified upon usage. Both type declarations and function declarations can be generic. The most common examples are container types such as Array<T> and Set<T>.
In Cangjie, declarations of function, class, interface, struct, and enum can all declare type parameters, meaning they can all be generic.
For ease of discussion, the following commonly used terms are defined:
- Type Parameter: A type or function declaration may have one or more types that need to be specified at the point of use. These types are referred to as type parameters. When declaring a parameter, an identifier must be provided for reference within the declaration body.
- Type Variable: After declaring a type parameter, when these types are referenced via identifiers, these identifiers are called type variables.
- Type Argument: When specifying generic parameters while using a generic type or function, these parameters are called type arguments.
- Type Constructor: A type that requires zero, one, or more types as arguments is called a type constructor.
Type parameters are generally declared after the type name or function name, enclosed in angle brackets <...>. For example, a generic list can be declared as:
class List<T> {
var elem: Option<T> = None
var tail: Option<List<T>> = None
}
func sumInt(a: List<Int64>) { }
Here, T in List<T> is called a type parameter. The reference to T in elem: Option<T> is called a type variable, and similarly, T in tail: Option<List<T>> is also a type variable. In the parameter of the function sumInt, Int64 in List<Int64> is called the type argument for List. List is the type constructor, and List<Int64> constructs a list type for Int64 using the type argument Int64.
Generic Functions
A function is called a generic function if it declares one or more type parameters. Syntactically, type parameters are placed immediately after the function name, enclosed in <>, and separated by , if there are multiple type parameters.
Global Generic Functions
When declaring a global generic function, simply declare the type parameters using angle brackets after the function name. These type parameters can then be referenced in the function parameters, return type, and function body. For example, the id function is defined as:
func id<T>(a: T): T {
return a
}
Here, (a: T) is the function parameter declaration, which uses the type parameter T declared by the id function, and the return type of id also utilizes this type parameter.
Another more complex example is the generic function composition, which declares three type parameters T1, T2, T3. Its functionality is to compose two functions f: (T1) -> T2 and g: (T2) -> T3 into a function of type (T1) -> T3.
func composition<T1, T2, T3>(f: (T1) -> T2, g: (T2) -> T3): (T1) -> T3 {
return {x: T1 => g(f(x))}
}
Since the functions that can be composed can be of any type (e.g., composition of (Int32) -> Bool and (Bool) -> Int64, or (Int64) -> Rune and (Rune) -> Int8), generic functions are necessary.
func times2(a: Int64): Int64 {
return a * 2
}
func plus10(a: Int64): Int64 {
return a + 10
}
func times2plus10(a: Int64) {
return composition<Int64, Int64, Int64>(times2, plus10)(a)
}
main() {
println(times2plus10(9))
}
Here, composing two (Int64) -> Int64 functions first multiplies 9 by 2 and then adds 10, resulting in 28.
28
Local Generic Functions
Local functions can also be generic. For example, the generic function id can be nested within other functions:
func foo(a: Int64) {
func id<T>(a: T): T { a }
func double(a: Int64): Int64 { a + a }
return (id<Int64> ~> double)(a) == (double ~> id<Int64>)(a)
}
main() {
println(foo(1))
}
Due to the identity property of id, the functions id<Int64> ~> double and double ~> id<Int64> are equivalent, resulting in true.
true
Generic Member Functions
Member functions of classes, structs, and enums can be generic. For example:
class A {
func foo<T>(a: T): Unit where T <: ToString {
println("${a}")
}
}
struct B {
func bar<T>(a: T): Unit where T <: ToString {
println("${a}")
}
}
enum C {
| X | Y
func coo<T>(a: T): Unit where T <: ToString {
println("${a}")
}
}
main() {
var a = A()
var b = B()
var c = C.X
a.foo<Int64>(10)
b.bar<String>("abc")
c.coo<Bool>(false)
}
The program output will be:
10
abc
false
When extending types using the extend declaration, the functions within the extension can also be generic. For example, we can add a generic member function to the Int64 type:
extend Int64 {
func printIntAndArg<T>(a: T) where T <: ToString {
println(this)
println("${a}")
}
}
main() {
var a: Int64 = 12
a.printIntAndArg<String>("twelve")
}
The program output will be:
12
twelve
Static Generic Functions
Interfaces, classes, structs, enums, and extensions can define static generic functions. For example, the following ToPair class returns a tuple from an ArrayList:
import std.collection.ArrayList
class ToPair {
public static func fromArray<T>(l: ArrayList<T>): (T, T) {
return (l[0], l[1])
}
}
main() {
var res: ArrayList<Int64> = ArrayList([1,2,3,4])
var a: (Int64, Int64) = ToPair.fromArray<Int64>(res)
}
Generic Interfaces
Generics can be used to define generic interfaces. Taking the Iterable interface from the standard library as an example, its member function iterator needs to return an Iterator type, which serves as a container’s traverser. Iterator is a generic interface that contains a next member function for retrieving the next element from the container type. The return type of the next member function is a type that needs to be specified during usage, so Iterator requires the declaration of generic parameters.
public interface Iterable<E> {
func iterator(): Iterator<E>
}
public interface Iterator<E> <: Iterable<E> {
func next(): Option<E>
}
public interface Collection<T> <: Iterable<T> {
prop size: Int64
func isEmpty(): Bool
}
Generic Classes
Generic Interfaces introduced the definition and usage of generic interfaces. This section covers the definition and usage of generic classes. For example, the key-value pairs in Map are defined using generic classes.
The Node type for key-value pairs in Map can be defined using a generic class:
open class Node<K, V> where K <: Hashable & Equatable<K> {
var key: Option<K> = Option<K>.None
var value: Option<V> = Option<V>.None
init() {}
init(key: K, value: V) {
this.key = Option<K>.Some(key)
this.value = Option<V>.Some(value)
}
}
Since the types of keys and values may differ and can be any type that meets certain conditions, the Node class requires two type parameters K and V. The constraint K <: Hashable, K <: Equatable<K> specifies that the key type K must implement both the Hashable and Equatable<K> interfaces, which are the conditions K must satisfy. For more details on generic constraints, refer to the Generic Constraints section.
Because static member variables of generic classes share memory, the type declarations and expressions of static member variables or properties cannot reference type parameters or contain uninstantiated generic type expressions. Additionally, static variable or property initialization expressions cannot call static member functions or properties of generic classes.
class A<T> {}
class B<T> {
static func foo() {1}
static var err1: A<T> = A<T>() // Error, static member cannot depend on generic parameter 'Generics-T'
static prop err2: A<T> { // Error, static member cannot depend on generic parameter 'Generics-T'
get() {
A<T>() // Error, static member cannot depend on generic parameter 'Generics-T'
}
}
static var vfoo = foo() // Error, it's equal to 'static var vfoo = B<T>.foo()', implicit reference to generic 'T'.
static var ok: Int64 = 1
}
main() {
B<Int32>.ok = 2
println(B<Int64>.ok) // 2
}
Generic Structs
Generic struct types are similar to classes. Below is an example of using a struct to define a binary tuple-like type:
struct Pair<T, U> {
let x: T
let y: U
public init(a: T, b: U) {
x = a
y = b
}
public func first(): T {
return x
}
public func second(): U {
return y
}
}
main() {
var a: Pair<String, Int64> = Pair<String, Int64>("hello", 0)
println(a.first())
println(a.second())
}
The program output is:
hello
0
The Pair struct provides two functions first and second to retrieve the first and second elements of the tuple respectively.
Generic Enums
In the design of generic enum types in the Cangjie programming language, the Option type serves as a classic example. For a detailed description of Option, please refer to the Option Type chapter. The Option type is used to represent a value that may be empty for a certain type. Thus, Option can indicate computational failure for a particular type. Since the specific type of failure is indeterminate, it’s evident that Option is a generic type requiring the declaration of type parameters.
package std.core // `Option` is defined in std.core.
public enum Option<T> {
Some(T)
| None
public func getOrThrow(): T {
match (this) {
case Some(v) => v
case None => throw NoneValueException()
}
}
// ...
}
As shown, Option<T> has two variants: Some(T), which represents a successful return value, and None, which indicates an empty result. The getOrThrow function extracts the inner value from Some(T), returning a result of type T. If the parameter is None, it directly throws an exception.
For example, to define a safe division operation (since division computations may fail), we can return None when the divisor is 0, otherwise returning a result wrapped in Some:
func safeDiv(a: Int64, b: Int64): Option<Int64> {
var res: Option<Int64> = match (b) {
case 0 => None
case _ => Some(a/b)
}
return res
}
This approach ensures the program won’t throw arithmetic exceptions during runtime when encountering division by zero.
Subtyping Relationships of Generic Types
Instantiated generic types also have subtyping relationships. For example:
interface I<X, Y> { }
class C<Z> <: I<Z, Z> { }
Based on class C<Z> <: I<Z, Z> { }, we know that C<Bool> <: I<Bool, Bool> and C<D> <: I<D, D> hold, among others. This can be interpreted as “For all types Z without type variables, C<Z> <: I<Z, Z> holds.”
However, for the following code:
open class C { }
class D <: C { }
interface I<X> { }
I<D> <: I<C> does not hold (even though D <: C holds). This is because in the Cangjie language, user-defined type constructors are invariant at their type parameters.
The formal definition of variance is: If A and B are (instantiated) types, and T is a type constructor with a type parameter X (e.g., interface T<X>), then:
- If
T(A) <: T(B)if and only ifA = B, thenTis invariant. - If
T(A) <: T(B)if and only ifA <: B, thenTis covariant atX. - If
T(A) <: T(B)if and only ifB <: A, thenTis contravariant atX.
In the current version of Cangjie, all user-defined generic types are invariant at all their type parameters. Therefore, given interface I<X> and types A, B, I<A> <: I<B> holds only if A = B. Conversely, if I<A> <: I<B> is known, we can deduce A = B (with the exception of built-in types: built-in tuple types are covariant at each of their element types; built-in function types are contravariant at their parameter types and covariant at their return types.)
Note:
For types other than
classthat implement interfaces, the subtyping relationship between the type and the interface cannot serve as a basis for covariance or contravariance.
Invariance limits some expressive power of the language but also avoids certain safety issues, such as the “covariant array runtime exception” problem.
Type Aliases
When a type name is overly complex or not intuitive in a specific context, you can use a type alias to assign an alternative name to that type.
type I64 = Int64
A type alias definition begins with the keyword type, followed by the alias name (e.g., I64 in the example above), then an equals sign =, and finally the original type (i.e., the type being aliased, such as Int64 in the example above).
Type aliases can only be defined at the top level of a source file, and the original type must be visible at the point of alias definition. For example, in the following code, the alias definition for Int64 within main will result in an error, and the type LongNameClassB is not visible when defining its alias, which will also cause an error.
main() {
type I64 = Int64 // Error, type aliases can only be defined at the top level of the source file
}
class LongNameClassA { }
type B = LongNameClassB // Error, type 'LongNameClassB' is not defined
Direct or indirect circular references are prohibited in one or more type alias definitions.
type A = (Int64, A) // Error, 'A' refered itself
type B = (Int64, C) // Error, 'B' and 'C' are circularly refered
type C = (B, Int64)
A type alias does not define a new type; it merely provides another name for the original type. It can be used in the following scenarios:
-
As a type, for example:
type A = B class B {} var a: A = B() // Use typealias A as type B -
When the type alias actually refers to a class or struct, it can be used as a constructor name:
type A = B class B {} func foo() { A() } // Use type alias A as constructor of B -
When the type alias actually refers to a class, interface, or struct, it can be used as the type name to access internal static member variables or functions:
type A = B class B { static var b : Int32 = 0 static func foo() {} } func foo() { A.foo() // Use A to access static method in class B A.b } -
When the type alias actually refers to an enum, it can be used as the type name for the enum’s constructors:
enum TimeUnit { Day | Month | Year } type Time = TimeUnit var a = Time.Day var b = Time.Month // Use type alias Time to access constructors in TimeUnit
Note that currently, user-defined type aliases are not supported in type conversion expressions. Refer to the following example:
type MyInt = Int32
MyInt(0) // Error, no matching function for operator '()' function call
Generic Type Aliases
Type aliases can also declare type parameters, but constraints cannot be applied to these parameters using where clauses. Constraints for generic type arguments will be explained later.
When a generic type name is too long, a type alias can be used to declare a shorter alternative. For example, a type RecordData can be abbreviated as RD using a type alias:
struct RecordData<T> {
var a: T
public init(x: T) {
a = x
}
}
type RD<T> = RecordData<T>
main(): Int64 {
var struct1: RD<Int32> = RecordData<Int32>(2)
return 1
}
In usage, RD<Int32> can be used to refer to the RecordData<Int32> type.
Generic Constraints
The purpose of generic constraints is to specify the operations and capabilities that generic type parameters must possess when declaring functions, classes, interfaces, structs, or enums. Only by declaring these constraints can corresponding member functions be called. In many scenarios, generic type parameters need to be constrained. Take the id function as an example:
func id<T>(a: T) {
return a
}
The only thing the developer can do is return the function parameter a, but cannot perform operations like a + 1 or println("${a}") because it could be any type, such as (Bool) -> Bool, which cannot be added to an integer. Similarly, since it’s a function type, it cannot be printed to the command line via the println function. However, if constraints are applied to this generic type parameter, more operations become possible.
Constraints are broadly divided into interface constraints and class type constraints. Before the declaration body of a function or type, the where keyword can be used to declare generic constraints. For declared generic type parameters T1, T2, constraints can be specified using syntax like where T1 <: Interface, T2 <: Class. If multiple constraints apply to the same type parameter, they can be connected with &, e.g., where T1 <: Interface1 & Interface2.
In Cangjie, the println function can accept parameters of type string. If you need to print a generic type variable as a string on the command line, you can constrain this generic type parameter with the ToString interface defined in core, which is clearly an interface constraint:
package std.core // `ToString` is defined in core.
public interface ToString {
func toString(): String
}
This allows you to define a function named genericPrint using this constraint:
func genericPrint<T>(a: T) where T <: ToString {
println(a)
}
main() {
genericPrint<Int64>(10)
}
The result is:
10
If the type argument for the genericPrint function does not implement the ToString interface, the compiler will report an error. For example, when passing a function as a parameter:
func genericPrint<T>(a: T) where T <: ToString {
println(a)
}
main() {
genericPrint<(Int64) -> Int64>({ i => 0 })
}
If you compile the above file, the compiler will throw an error indicating that the generic type argument does not satisfy the constraint, because the type argument (Int64) -> Int64 does not satisfy (Int64) -> Int64 <: ToString.
In addition to interface-based constraints, class types can also be used to constrain generic type parameters. For example, when declaring a zoo type Zoo<T>, you might want the type parameter T to be constrained to subtypes of the Animal class, where Animal declares a run member function. Here, two subtypes Dog and Fox both implement the run member function, allowing instances stored in the animals array list within Zoo<T> to call the run member function:
import std.collection.ArrayList
abstract class Animal {
public func run(): String
}
class Dog <: Animal {
public func run(): String {
return "dog run"
}
}
class Fox <: Animal {
public func run(): String {
return "fox run"
}
}
class Zoo<T> where T <: Animal {
var animals: ArrayList<Animal> = ArrayList<Animal>()
public func addAnimal(a: T) {
animals.add(a)
}
public func allAnimalRuns() {
for(a in animals) {
println(a.run())
}
}
}
main() {
var zoo: Zoo<Animal> = Zoo<Animal>()
zoo.addAnimal(Dog())
zoo.addAnimal(Fox())
zoo.allAnimalRuns()
}
The program output is:
dog run
fox run
Note:
Constraints for generic type parameters can only be concrete class types or interfaces. If a type parameter has multiple class-type upper bounds, they must be in the same inheritance chain.
Extension Overview
Extensions can add new functionality to types (excluding functions, tuples, and interfaces) that are visible within the current package.
Extensions are used when you need to add additional functionality without breaking the encapsulation of the extended type.
The functionalities that can be added include:
- Adding member functions
- Adding operator overload functions
- Adding member properties
- Implementing interfaces
For specific examples of how to add the above functionalities, refer to the following example. For detailed syntax usage, please see subsequent sections:
interface Foo {
func printValue(a: Int64): Unit
}
class Boo {
var boo: Int64 = 2
}
extend Boo {
public prop x: Int64 { // Adding a member property
get() {
123
}
}
func newMember(): Unit {
println("This is a member function of a new extension.") // Adding a member function
}
public operator func -() {
println("Overload the operator addition function.") // Adding an operator overload function
-x
}
}
// Interface extension, implementing an interface
extend<T> Array<T> <: Foo {
public func printValue(a: Int64) {
println("The is ${a}.")
}
}
Although extensions can add extra functionality, they cannot alter the encapsulation of the extended type. Therefore, extensions do not support the following functionalities:
- Extensions cannot add member variables.
- Functions and properties in extensions must have implementations.
- Functions and properties in extensions cannot be modified with
open,override, orredef. - Extensions cannot access members modified with
privatein the extended type.
Based on whether an extension implements a new interface, extensions can be divided into two usage types: direct extensions and interface extensions. A direct extension does not include additional interfaces, while an interface extension includes interfaces. Interface extensions can be used to add new functionality to existing types and implement interfaces, enhancing abstract flexibility.
Direct Extension
A simple example of extension syntax is as follows:
extend String {
public func printSize() {
println("the size is ${this.size}")
}
}
As shown in the example above, extensions are declared using the extend keyword, followed by the extended type String and the extended functionality.
After extending the String type with the printSize function, instances of String within the current package can access this function as if it were native to String.
main() {
let a = "123"
a.printSize() // the size is 3
}
Compiling and executing the above code yields the output:
the size is 3
When extending generic types, there are two syntax approaches for adding functionality.
The first approach extends specific instantiated generic types. The extend keyword can be followed by any fully instantiated generic type. The added functionality is only available when the type exactly matches, and the type arguments must satisfy the constraints defined in the generic type declaration.
For example, consider Foo<T> below:
class Foo<T> where T <: ToString {}
extend Foo<Int64> {} // OK
class Bar {}
extend Foo<Bar> {} // Error, generics type arguments do not match the constraint of 'Class-Foo<Generics-T>'
The second approach uses generic extension by introducing type parameters after extend. Generic extensions can extend uninstantiated or partially instantiated generic types. The type parameters declared after extend must be directly or indirectly used in the extended generic type. The added functionality is only available when both the type and constraints fully match.
For example, consider MyList<T> below:
class MyList<T> {
public let data: Array<T> = Array<T>()
}
extend<T> MyList<T> {} // OK
extend<R> MyList<R> {} // OK
extend<T, R> MyList<(T, R)> {} // OK
extend MyList {} // Error, generic type should be used with type argument
extend<T, R> MyList<T> {} // Error, type parameter 'R' must be used in extended type
extend<T, R> MyList<T, R> {} // Error, type argument's number does not match type parameter's number
For generic type extensions, additional constraints can be declared to implement functions that are only available under specific conditions.
For example, we can define a type called Pair that conveniently stores two elements (similar to Tuple). We want the Pair type to accommodate any type, so the two generic parameters should have no constraints to ensure Pair can hold all types. However, we also want Pair to support equality comparison when both elements are comparable. This can be achieved using extensions.
As shown in the code below, the extension syntax constrains T1 and T2 to support the equals operation, enabling Pair to implement the equals function when these conditions are met.
class Pair<T1, T2> {
var first: T1
var second: T2
public init(a: T1, b: T2) {
first = a
second = b
}
}
interface Eq<T> {
func equals(other: T): Bool
}
extend<T1, T2> Pair<T1, T2> where T1 <: Eq<T1>, T2 <: Eq<T2> {
public func equals(other: Pair<T1, T2>) {
first.equals(other.first) && second.equals(other.second)
}
}
class Foo <: Eq<Foo> {
public func equals(other: Foo): Bool {
true
}
}
main() {
let a = Pair(Foo(), Foo())
let b = Pair(Foo(), Foo())
println(a.equals(b)) // true
}
Compiling and executing the above code yields the output:
true
Interface Extension
For example, in the following case, the type Array does not inherently implement the interface PrintSizeable. However, we can use extension to add an additional member function printSize to Array and implement PrintSizeable.
interface PrintSizeable {
func printSize(): Unit
}
extend<T> Array<T> <: PrintSizeable {
public func printSize() {
println("The size is ${this.size}")
}
}
After extending Array to implement PrintSizeable, it is equivalent to Array having implemented PrintSizeable at its definition time.
Therefore, Array can be used as an implementation type of PrintSizeable, as shown in the following code.
main() {
let a: PrintSizeable = Array<Int64>()
a.printSize() // 0
}
Compiling and executing the above code yields the output:
The size is 0
Multiple interfaces can be implemented simultaneously within the same extension. Separate the interfaces with &, and their order does not matter.
As shown in the following code, we can implement I1, I2, and I3 for Foo in a single extension.
interface I1 {
func f1(): Unit
}
interface I2 {
func f2(): Unit
}
interface I3 {
func f3(): Unit
}
class Foo {}
extend Foo <: I1 & I2 & I3 {
public func f1(): Unit {}
public func f2(): Unit {}
public func f3(): Unit {}
}
Additional generic constraints can also be declared in interface extensions to satisfy interfaces under specific conditions.
For example, we can make the Pair type implement the Eq interface, allowing Pair itself to become a type that satisfies the Eq constraint, as shown below.
class Pair<T1, T2> {
var first: T1
var second: T2
public init(a: T1, b: T2) {
first = a
second = b
}
}
interface Eq<T> {
func equals(other: T): Bool
}
extend<T1, T2> Pair<T1, T2> <: Eq<Pair<T1, T2>> where T1 <: Eq<T1>, T2 <: Eq<T2> {
public func equals(other: Pair<T1, T2>) {
first.equals(other.first) && second.equals(other.second)
}
}
class Foo <: Eq<Foo> {
public func equals(other: Foo): Bool {
true
}
}
main() {
let a = Pair(Foo(), Foo())
let b = Pair(Foo(), Foo())
println(a.equals(b)) // true
}
Compiling and executing the above code yields the output:
true
If the extended type already includes the functions or properties required by the interface, these functions or properties need not (and cannot) be re-implemented in the extension.
For example, in the following case, a new interface Sizeable is defined to retrieve the size of a type. Since Array already contains this function, we can extend Array to implement Sizeable without adding additional functions.
interface Sizeable {
prop size: Int64
}
extend<T> Array<T> <: Sizeable {}
main() {
let a: Sizeable = Array<Int64>()
println(a.size)
}
Compiling and executing the above code yields the output:
0
When interface extensions implement interfaces with inheritance relationships, the extensions are checked in the order of “first checking extensions implementing parent interfaces, then checking extensions implementing child interfaces.”
For example, if interface I1 has a child interface I2, and I1 contains a default implementation, while type A has two extensions implementing the parent and child interfaces respectively, the extension implementing I1 will be checked first, followed by the extension implementing I2.
interface I1 {
func foo(): Unit { println("I1 foo") }
}
interface I2 <: I1 {
func foo(): Unit { println("I2 foo") }
}
class A {}
extend A <: I1 {} // first check
extend A <: I2 {} // second check
main() {
A().foo()
}
Compiling and executing the above code yields the output:
I2 foo
In this example, when checking the extension implementing I1, the foo function is inherited from I1. When checking the extension implementing I2, since A already has an inherited default implementation of foo with the same signature, this foo will be overridden. Thus, when calling A’s foo function, it ultimately points to the implementation in I2 (the child interface).
If two interface extensions of the same type implement interfaces with conflicting inheritance relationships, making it impossible to determine the checking order, an error will be reported.
interface I1 {}
interface I2 <: I1 {}
interface I3 {}
interface I4 <: I3 {}
class A {}
extend A <: I1 & I4 {} // error: unable to decide which extension happens first
extend A <: I2 & I3 {} // error: unable to decide which extension happens first
If two interface extensions of the same type implement interfaces without inheritance relationships, they will be checked simultaneously.
interface I1 {
func foo() {}
}
interface I2 {
func foo() {}
}
class A {}
extend A <: I1 {} // Error, multiple default implementations, need to re-implement 'foo' in 'A'
extend A <: I2 {} // Error, multiple default implementations, need to re-implement 'foo' in 'A'
Note:
When class A has a generic base class
B<T1,...,Tn>, andB<T1,...,Tn>extends an interfaceI<R1,...,Rn>with default implementations of instance or static functions (e.g.,foo), if this function is not overridden inB<T1,...,Tn>or its extensions, and class A does not directly implement the interfaceI<R1,...,Rn>, calling the functionfoothrough an instance of class A may lead to unexpected behavior.
interface I<N> {
func foo(n: N): N {n}
}
open class B<T> {}
extend<T> B<T> <: I<T> {}
class A <: B<Int64>{}
main() {
A().foo(0) // this call triggers unexpected behaviour
}
Access Rules
Extension Modifiers
Extensions themselves cannot be modified with modifiers.
For example, in the following example, using the public modifier before directly extending A will result in a compilation error.
public class A {}
public extend A {} // Error, expected no modifier before extend
Modifiers that can be used for extension members include: static, public, protected, internal, private, mut.
- Members modified with
privatecan only be used within the extension and are invisible externally. - Members modified with
internalcan be used within the current package and its sub-packages (including sub-packages of sub-packages), which is the default behavior. - Members modified with
protectedcan be accessed within the current module (subject to export rules). When the extended type is a class, the subclass definition body of that class can also access them. - Members modified with
staticcan only be accessed via the type name and not through instance objects. - Extensions for
structtypes can definemutfunctions.
package p1
public open class A {}
extend A {
public func f1() {}
protected func f2() {}
private func f3() {}
static func f4() {}
}
main() {
A.f4()
var a = A()
a.f1()
a.f2()
}
Member definitions within extensions do not support the use of open, override, or redef modifiers.
class Foo {
public open func f() {}
static func h() {}
}
extend Foo {
public override func f() {} // Error
public open func g() {} // Error
redef static func h() {} // Error
}
Orphan Rule for Extensions
Implementing an interface from another package for a type from a different package can cause confusion.
To prevent a type from accidentally implementing an inappropriate interface, Cangjie does not allow orphan extensions—i.e., extensions that are neither defined in the same package as the interface (including all interfaces in the interface inheritance chain) nor in the same package as the extended type.
As shown in the following code, you cannot implement Bar from package b for Foo from package a within package c.
You can only implement Bar for Foo in package a or package b.
// package a
public class Foo {}
// package b
public interface Bar {}
// package c
import a.Foo
import b.Bar
extend Foo <: Bar {} // Error
Access and Shadowing in Extensions
Instance members in extensions can use this just like in the type definition, and this functions the same way. this can also be omitted when accessing members. Instance members in extensions cannot use super.
class A {
var v = 0
}
extend A {
func f() {
print(this.v) // OK
print(v) // OK
}
}
Extensions cannot access members modified with private in the extended type.
class A {
private var v1 = 0
protected var v2 = 0
}
extend A {
func f() {
print(v1) // Error
print(v2) // OK
}
}
Extensions cannot shadow any members of the extended type.
class A {
func f() {}
}
extend A {
func f() {} // Error
}
Extensions are also not allowed to shadow any members added by other extensions.
class A {}
extend A {
func f() {}
}
extend A {
func f() {} // Error
}
Within the same package, a type can be extended multiple times, and within an extension, you can directly call non-private functions from other extensions of the extended type.
class Foo {}
extend Foo { // OK
private func f() {}
func g() {}
}
extend Foo { // OK
func h() {
g() // OK
f() // Error
}
}
When extending generic types, additional generic constraints can be used. The visibility rules between any two extensions of a generic type are as follows:
- If two extensions have the same constraints, they are mutually visible—i.e., functions or properties from one extension can be directly used in the other.
- If two extensions have different constraints, and one constraint is a superset of the other, the extension with the looser constraint is visible to the one with the stricter constraint, but not vice versa.
- If two extensions have different constraints and the constraints are not subsets of each other, the extensions are mutually invisible.
Example: Suppose there are two extensions, extension 1 and extension 2, for the same type E<X>. If the constraint for X in extension 1 is stricter than in extension 2, then functions and properties in extension 1 are invisible to extension 2, but functions and properties in extension 2 are visible to extension 1.
open class A {}
class B <: A {}
class E<X> {}
interface I1 {
func f1(): Unit
}
interface I2 {
func f2(): Unit
}
extend<X> E<X> <: I1 where X <: B { // extension 1
public func f1(): Unit {
f2() // OK
}
}
extend<X> E<X> <: I2 where X <: A { // extension 2
public func f2(): Unit {
f1() // Error
}
}
Import and Export of Extensions
Extensions can also be imported and exported, but extensions themselves cannot be modified with visibility modifiers. The export of extensions follows special rules.
For direct extensions, when the extension and the extended type are in the same package, whether the extension is exported is determined by the access modifiers of both the extended type and the generic constraints (if any). When all generic constraints are exported types (for modifier and export rules, see the Top-Level Declaration Visibility chapter), the extension will be exported. When the extension and the extended type are in different packages, the extension will not be exported.
As shown in the following code, Foo is exported. The extension containing the f1 function is not exported because its generic constraint is not exported. The extensions containing f2 and f3 functions are exported because their generic constraints are exported. The extension containing the f4 function is not exported because one of its generic constraints, I1, is not exported. The extension containing the f5 function is exported because all its generic constraints are exported.
// package a.b
package a.b
private interface I1 {}
internal interface I2 {}
protected interface I3 {}
extend Int64 <: I1 & I2 & I3 {}
public class Foo<T> {}
// The extension will not be exported
extend<T> Foo<T> where T <: I1 {
public func f1() {}
}
// The extension will be exported, and only packages that import both Foo and I2 will be able to access it.
extend<T> Foo<T> where T <: I2 {
public func f2() {}
}
// The extension will be exported, and only packages that import both Foo and I3 will be able to access it.
extend<T> Foo<T> where T <: I3 {
public func f3() {}
}
// The extension will not be exported. The I1 with the lowest access level determines the export.
extend<T> Foo<T> where T <: I1 & I2 & I3 {
public func f4() {}
}
// The extension is exported. Only the package that imports Foo, I2, and I3 can access the extension.
extend<T> Foo<T> where T <: I2 & I3 {
public func f5() {}
}
// package a.c
package a.c
import a.b.*
main() {
Foo<Int64>().f1() // Cannot access.
Foo<Int64>().f2() // Cannot access. Visible only for sub-pkg.
Foo<Int64>().f3() // OK.
Foo<Int64>().f4() // Cannot access.
Foo<Int64>().f5() // Cannot access. Visible only for sub-pkg.
}
// package a.b.d
package a.b.d
import a.b.*
main() {
Foo<Int64>().f1() // Cannot access.
Foo<Int64>().f2() // OK.
Foo<Int64>().f3() // OK.
Foo<Int64>().f4() // Cannot access.
Foo<Int64>().f5() // OK.
}
For interface extensions, there are two scenarios:
- When the interface extension and the extended type are in the same
package, the extension will be exported along with the extended type and generic constraints (if any), regardless of the interface type’s access level. Packages outside do not need to import the interface type to access the extension’s members. - When the interface extension and the extended type are in different
packages, whether the extension is exported is determined by the lowest access level among the interface type and the generic constraints (if any). Otherpackagesmust import the extended type, the corresponding interface, and any constraint types (if applicable) to access the extension members of the interface.
As shown in the following code, in package a, even though the interface access modifier is private, the extension for Foo will still be exported.
// package a
package a
private interface I0 {}
public class Foo<T> {}
// The extension is exported.
extend<T> Foo<T> <: I0 {}
When extending the Foo type in another package, whether the extension is exported depends on the access modifiers of the implemented interface and generic constraints. The extension will be exported if at least one implemented interface is exported and all generic constraints are exportable.
// package b
package b
import a.Foo
private interface I1 {}
internal interface I2 {}
protected interface I3 {}
public interface I4 {}
// The extension will not be exported because I1 is not visible outside the file.
extend<T> Foo<T> <: I1 {}
// The extension is exported.
extend<T> Foo<T> <: I2 {}
// The extension is exported.
extend<T> Foo<T> <: I3 {}
// The extension is exported
extend<T> Foo<T> <: I1 & I2 & I3 {}
// The extension will not be exported. The I1 with the lowest access level determines the export.
extend<T> Foo<T> <: I4 where T <: I1 & I2 & I3 {}
// The extension is exported.
extend<T> Foo<T> <: I4 where T <: I2 & I3 {}
// The extension is exported.
extend<T> Foo<T> <: I4 & I3 where T <: I2 {}
Specifically, the exported members of interface extensions are limited to those contained within the interfaces.
// package a
package a
public class Foo {}
// package b
package b
import a.Foo
public interface I1 {
func f1(): Unit
}
public interface I2 {
func f2(): Unit
}
extend Foo <: I1 & I2 {
public func f1(): Unit {}
public func f2(): Unit {}
public func f3(): Unit {} // f3 will not be exported
}
// package c
package c
import a.Foo
import b.I1
main() {
let x: Foo = Foo()
x.f1() // OK, because f1 is a member of I1.
x.f2() // error, I2 is not imported
x.f3() // error, f3 not found
}
Similar to the export of extensions, importing extensions does not require explicit import statements. To import all accessible extensions, one only needs to import the extended type, interfaces, and generic constraints (if any).
As shown in the following code, in package b, importing Foo alone is sufficient to use the function f from the corresponding extension of Foo.
For interface extensions, it is necessary to import both the extended type and the extended interfaces (plus generic constraints if present) to use them. Therefore, in package c, both Foo and I must be imported to use the function g from the corresponding extension.
// package a
package a
public class Foo {}
extend Foo {
public func f() {}
}
// package b
package b
import a.Foo
public interface I {
func g(): Unit
}
extend Foo <: I {
public func g() {
this.f() // OK
}
}
// package c
package c
import a.Foo
import b.I
func test() {
let a = Foo()
a.f() // OK
a.g() // OK
}
Overview of Basic Collection Types
This chapter introduces several fundamental Collection types commonly used in the Cangjie language, including Array, ArrayList, HashSet, and HashMap.
You can choose the appropriate type for specific business scenarios:
- Array: When you don’t need to add or remove elements but require element modification
- ArrayList: When frequent element insertion, deletion, querying, and modification are needed
- HashSet: When you want each element to be unique
- HashMap: When you need to store a series of key-value mappings
The following table summarizes the basic characteristics of these types:
| Type Name | Mutable Elements | Add/Remove Elements | Element Uniqueness | Ordered Sequence |
|---|---|---|---|---|
Array<T> | Y | N | N | Y |
ArrayList<T> | Y | Y | N | Y |
HashSet<T> | N | Y | Y | N |
HashMap<K, V> | K: N, V: Y | Y | K: Y, V: N | N |
ArrayList
To use the ArrayList type, you need to import the collection package:
import std.collection.*
In Cangjie, ArrayList<T> represents the ArrayList type, where T denotes the element type of the ArrayList, which can be any type.
ArrayList has excellent expansion capabilities, making it suitable for scenarios requiring frequent addition and deletion of elements.
Compared to Array, ArrayList allows both in-place modification of elements and in-place addition/deletion of elements.
The mutability of ArrayList is a highly useful feature, enabling all references to the same ArrayList instance to share the same elements and apply unified modifications.
var a: ArrayList<Int64> = ... // ArrayList whose element type is Int64
var b: ArrayList<String> = ... // ArrayList whose element type is String
ArrayLists with different element types are distinct types and therefore cannot be assigned to each other.
Thus, the following example is invalid:
b = a // Type mismatch
In Cangjie, you can construct a specific ArrayList using constructors.
let a = ArrayList<String>() // Created an empty ArrayList whose element type is String
let b = ArrayList<String>(100) // Created an ArrayList whose element type is String, and allocate a space of 100
let c = ArrayList<Int64>([0, 1, 2]) // Created an ArrayList whose element type is Int64, containing elements 0, 1, 2
let d = ArrayList<Int64>(c) // Use another Collection to initialize an ArrayList
let e = ArrayList<String>(2, {x: Int64 => x.toString()}) // Created an ArrayList whose element type is String and size is 2. All elements are initialized by specified rule function
Accessing ArrayList Members
When you need to access all elements of an ArrayList, you can use a for-in loop to iterate through them.
import std.collection.ArrayList
main() {
let list = ArrayList<Int64>([0, 1, 2])
for (i in list) {
println("The element is ${i}")
}
}
Compiling and executing the above code will output:
The element is 0
The element is 1
The element is 2
To determine the number of elements in an ArrayList, you can use the size property.
import std.collection.ArrayList
main() {
let list = ArrayList<Int64>([0, 1, 2])
if (list.size == 0) {
println("This is an empty arraylist")
} else {
println("The size of arraylist is ${list.size}")
}
}
Compiling and executing the above code will output:
The size of arraylist is 3
To access a single element at a specified position, you can use subscript syntax (the subscript must be of type Int64). The first element of a non-empty ArrayList always starts at position 0. You can access any element from 0 up to the last position (ArrayList’s size - 1). Using a negative index or an index greater than or equal to the size will trigger a runtime exception.
let a = list[0] // a == 0
let b = list[1] // b == 1
let c = list[-1] // Runtime exceptions
ArrayList also supports Range syntax in subscripts. For details, refer to the Array chapter.
Modifying ArrayList
You can use subscript syntax to modify elements at specific positions.
let list = ArrayList<Int64>([0, 1, 2])
list[0] = 3
ArrayList is a reference type. When used as an expression, ArrayList does not create a copy; all references to the same ArrayList instance share the same data.
Thus, modifications to ArrayList elements affect all references to that instance.
let list1 = ArrayList<Int64>([0, 1, 2])
let list2 = list1
list2[0] = 3
// list1 contains elements 3, 1, 2
// list2 contains elements 3, 1, 2
To add a single element to the end of an ArrayList, use the add function. To add multiple elements simultaneously, use the add(all!: Collection<T>) function, which accepts other Collection types with the same element type, such as Array. For details on Collection types, refer to Basic Collection Type Overview.
import std.collection.ArrayList
main() {
let list = ArrayList<Int64>()
list.add(0) // list contains element 0
list.add(1) // list contains elements 0, 1
let li = [2, 3]
list.add(all: li) // list contains elements 0, 1, 2, 3
}
You can use the add(T, at!: Int64) and add(all!: Collection<T>, at!: Int64) functions to insert a single element or a Collection of the same element type at a specified index. The element at that index and subsequent elements will be shifted to make space.
let list = ArrayList<Int64>([0, 1, 2]) // list contains elements 0, 1, 2
list.add(4, at: 1) // list contains elements 0, 4, 1, 2
To remove an element from an ArrayList, use the remove function, specifying the index to remove. Subsequent elements will be shifted forward to fill the gap.
let list = ArrayList<String>(["a", "b", "c", "d"]) // list contains the elements "a", "b", "c", "d"
list.remove(at: 1) // Delete the element at subscript 1, now the list contains elements "a", "c", "d"
Increasing ArrayList Size
Each ArrayList requires a specific amount of memory to store its contents. When adding elements to an ArrayList causes it to exceed its reserved capacity, the ArrayList allocates a larger memory region and copies all elements to the new memory. This growth strategy means that add operations triggering reallocation incur performance costs, but as the ArrayList’s reserved memory grows larger, these operations occur less frequently.
If you know approximately how many elements you’ll be adding, you can pre-allocate sufficient memory before adding to avoid intermediate reallocations, thereby improving performance.
import std.collection.ArrayList
main() {
let list = ArrayList<Int64>(100) // Allocate space at once
for (i in 0..100) {
list.add(i) // Does not trigger reallocation of space
}
list.reserve(100) // Prepare more space
for (i in 0..100) {
list.add(i) // Does not trigger reallocation of space
}
}
HashSet
To use the HashSet type, you need to import the collection package:
import std.collection.*
You can use the HashSet type to construct a Collection that contains only unique elements.
Cangjie uses HashSet<T> to represent the HashSet type, where T denotes the element type of the HashSet. T must be a type that implements both the Hashable and Equatable<T> interfaces, such as numeric values or String.
var a: HashSet<Int64> = ... // HashSet whose element type is Int64
var b: HashSet<String> = ... // HashSet whose element type is String
HashSets with different element types are distinct types, so they cannot be assigned to each other.
Therefore, the following example is invalid:
b = a // Type mismatch
In Cangjie, you can construct a specific HashSet using constructor methods.
let a = HashSet<String>() // Created an empty HashSet whose element type is String
let b = HashSet<String>(100) // Created a HashSet whose capacity is 100
let c = HashSet<Int64>([0, 1, 2]) // Created a HashSet whose element type is Int64, containing elements 0, 1, 2
let d = HashSet<Int64>(c) // Use another Collection to initialize a HashSet
let e = HashSet<Int64>(10, {x: Int64 => (x * x)}) // Created a HashSet whose element type is Int64 and size is 10. All elements are initialized by specified rule function
Accessing HashSet Members
When you need to access all elements of a HashSet, you can use a for-in loop to iterate through all elements.
Note that HashSet does not guarantee element ordering based on insertion sequence, so traversal order may differ from insertion order.
import std.collection.*
main() {
let mySet = HashSet<Int64>([0, 1, 2])
for (i in mySet) {
println("The element is ${i}")
}
}
Compiling and executing the above code might output:
The element is 0
The element is 1
The element is 2
To determine the number of elements in a HashSet, use the size property.
import std.collection.*
main() {
let mySet = HashSet<Int64>([0, 1, 2])
if (mySet.size == 0) {
println("This is an empty hashset")
} else {
println("The size of hashset is ${mySet.size}")
}
}
Compiling and executing the above code will output:
The size of hashset is 3
To check if an element exists in a HashSet, use the contains function. It returns true if the element exists, otherwise false.
let mySet = HashSet<Int64>([0, 1, 2])
let a = mySet.contains(0) // a == true
let b = mySet.contains(-1) // b == false
Modifying HashSet
HashSet is a mutable reference type that provides functionality for adding and removing elements.
The mutability of HashSet is a useful feature, allowing all references to the same HashSet instance to share the same elements and receive unified modifications.
To add a single element to a HashSet, use the add function. To add multiple elements simultaneously, use the add(all!: Collection<T>) function, which accepts another Collection type (such as Array) with the same element type. When the element doesn’t exist, the add function performs the addition; when the element already exists in the HashSet, the add function has no effect.
let mySet = HashSet<Int64>()
mySet.add(0) // mySet contains elements 0
mySet.add(0) // mySet contains elements 0
mySet.add(1) // mySet contains elements 0, 1
let li = [2, 3]
mySet.add(all: li) // mySet contains elements 0, 1, 2, 3
HashSet is a reference type. When used as an expression, HashSet doesn’t create copies—all references to the same HashSet instance share the same data.
Therefore, modifications to HashSet elements affect all references to that instance.
let set1 = HashSet<Int64>([0, 1, 2])
let set2 = set1
set2.add(3)
// set1 contains elements 0, 1, 2, 3
// set2 contains elements 0, 1, 2, 3
To remove elements from a HashSet, use the remove function, specifying the element to be removed.
let mySet = HashSet<Int64>([0, 1, 2, 3])
mySet.remove(1) // mySet contains elements 0, 2, 3
HashMap
To use the HashMap type, you need to import the collection package:
import std.collection.*
You can use the HashMap type to construct a Collection of key-value pairs.
HashMap is a hash table that provides fast access to its contained elements. Each element in the table is identified by its key, and the corresponding value can be accessed using the key.
Cangjie uses HashMap<K, V> to represent the HashMap type, where K denotes the key type of the HashMap. K must be a type that implements both the Hashable and Equatable<K> interfaces, such as numeric types or String. V denotes the value type of the HashMap, which can be any type.
var a: HashMap<Int64, Int64> = ... // HashMap whose key type is Int64 and value type is Int64
var b: HashMap<String, Int64> = ... // HashMap whose key type is String and value type is Int64
HashMaps with different element types are considered distinct types, so they cannot be assigned to each other.
Therefore, the following example is invalid:
b = a // Type mismatch
In Cangjie, you can construct a specific HashMap using constructors.
let a = HashMap<String, Int64>() // Created an empty HashMap whose key type is String and value type is Int64
let b = HashMap<String, Int64>([("a", 0), ("b", 1), ("c", 2)]) // whose key type is String and value type is Int64, containing elements ("a", 0), ("b", 1), ("c", 2)
let c = HashMap<String, Int64>(b) // Use another Collection to initialize a HashMap
let d = HashMap<String, Int64>(10) // Created a HashMap whose key type is String and value type is Int64 and capacity is 10
let e = HashMap<Int64, Int64>(10, {x: Int64 => (x, x * x)}) // Created a HashMap whose key and value type is Int64 and size is 10. All elements are initialized by specified rule function
Accessing HashMap Members
When you need to access all elements of a HashMap, you can use a for-in loop to iterate through all elements.
Note that HashMap does not guarantee the order of elements based on insertion, so the traversal order may differ from the insertion order.
import std.collection.HashMap
main() {
let map = HashMap<String, Int64>([("a", 0), ("b", 1), ("c", 2)])
for ((k, v) in map) {
println("The key is ${k}, the value is ${v}")
}
}
Compiling and executing the above code might output:
The key is a, the value is 0
The key is b, the value is 1
The key is c, the value is 2
When you need to know the number of elements in a HashMap, you can use the size property to obtain this information.
import std.collection.HashMap
main() {
let map = HashMap<String, Int64>([("a", 0), ("b", 1), ("c", 2)])
if (map.size == 0) {
println("This is an empty hashmap")
} else {
println("The size of hashmap is ${map.size}")
}
}
Compiling and executing the above code will output:
The size of hashmap is 3
To check if a HashMap contains a specific key, you can use the contains function. It returns true if the key exists, otherwise false.
let map = HashMap<String, Int64>([("a", 0), ("b", 1), ("c", 2)])
let a = map.contains("a") // a == true
let b = map.contains("d") // b == false
To access the value associated with a specific key, you can use subscript syntax (the subscript type must match the key type). Using a non-existent key as an index will trigger a runtime exception.
let map = HashMap<String, Int64>([("a", 0), ("b", 1), ("c", 2)])
let a = map["a"] // a == 0
let b = map["b"] // b == 1
let c = map["d"] // Runtime exceptions
Modifying HashMap
HashMap is a mutable reference type that provides functionality to modify, add, and remove elements.
The mutability of HashMap is a highly useful feature, allowing all references to the same HashMap instance to share the same elements and apply modifications uniformly.
You can use subscript syntax to modify the value associated with a key.
let map = HashMap<String, Int64>([("a", 0), ("b", 1), ("c", 2)])
map["a"] = 3
HashMap is a reference type. When used as an expression, HashMap does not create a copy; all references to the same HashMap instance share the same data.
Therefore, modifications to HashMap elements will affect all references to that instance.
let map1 = HashMap<String, Int64>([("a", 0), ("b", 1), ("c", 2)])
let map2 = map1
map2["a"] = 3
// map1 contains the elements ("a", 3), ("b", 1), ("c", 2)
// map2 contains the elements ("a", 3), ("b", 1), ("c", 2)
To add a single key-value pair to a HashMap, use the add function. To add multiple key-value pairs simultaneously, use the add(all!: Collection<(K, V)>) function. If the key does not exist, the add function will perform an insertion. If the key exists, the add function will overwrite the old value with the new one.
let map = HashMap<String, Int64>()
map.add("a", 0) // map contains the element ("a", 0)
map.add("b", 1) // map contains the elements ("a", 0), ("b", 1)
let map2 = HashMap<String, Int64>([("c", 2), ("d", 3)])
map.add(all: map2) // map contains the elements ("a", 0), ("b", 1), ("c", 2), ("d", 3)
Alternatively, you can use assignment syntax to directly add new key-value pairs to a HashMap.
let map = HashMap<String, Int64>([("a", 0), ("b", 1), ("c", 2)])
map["d"] = 3 // map contains the elements ("a", 0), ("b", 1), ("c", 2), ("d", 3)
To remove an element from a HashMap, use the remove function and specify the key to be deleted.
let map = HashMap<String, Int64>([("a", 0), ("b", 1), ("c", 2), ("d", 3)])
map.remove("d") // map contains the elements ("a", 0), ("b", 1), ("c", 2)
Iterable and Collections
We have previously learned about Range, Array, and ArrayList, all of which can be traversed using for-in operations. For types defined by developers, similar traversal operations can also be implemented.
Range, Array, and ArrayList all support the for-in syntax through Iterable.
Iterable is a built-in interface with the following form (only core code is shown):
interface Iterable<T> {
func iterator(): Iterator<T>
...
}
The iterator function requires the returned Iterator type to be another built-in interface with the following form (only core code is shown):
interface Iterator<T> <: Iterable<T> {
mut func next(): Option<T>
...
}
You can use the for-in syntax to traverse any instance of a type that implements the Iterable interface.
Assume we have the following for-in code:
let list = [1, 2, 3]
for (i in list) {
println(i)
}
It is equivalent to the following while code:
let list = [1, 2, 3]
var it = list.iterator()
while (true) {
match (it.next()) {
case Some(i) => println(i)
case None => break
}
}
Another common method for traversing Iterable types is to use pattern matching in the condition of a while expression. For example, another equivalent form of the above while code is:
let list = [1, 2, 3]
var it = list.iterator()
while (let Some(i) <- it.next()) {
println(i)
}
Array, ArrayList, HashSet, and HashMap types all implement Iterable, so they can be used in for-in or while loops.
Overview of Packages
As project scale continues to expand, managing source code in a single oversized file becomes increasingly difficult. In such cases, source code can be grouped by functionality, with different functional code segments managed separately. Each independently managed group generates an output file. During usage, corresponding functionality can be accessed by importing the appropriate output file, or more complex features can be achieved through interaction and combination of different functionalities, thereby making project management more efficient.
In the Cangjie programming language, a package is the smallest unit of compilation. Each package can independently output artifacts such as AST files, static library files, or dynamic library files. Each package has its own namespace, and within the same package, no duplicate top-level definitions or declarations are allowed (except for function overloading). A package may contain multiple source files.
A module is a collection of packages and represents the smallest unit of distribution for third-party developers. A module’s program entry point must be located in its root directory, and it can have at most one main function serving as the program entry at the top level. This main function either takes no parameters or has parameters of type Array<String>, and its return type must be an integer type or Unit.
Package Declaration
In the Cangjie programming language, package declarations begin with the keyword package, followed by the package names from the root package to the current package, separated by .. Package names must be valid ordinary identifiers (excluding raw identifiers). For example:
package pkg1 // root package pkg1
package pkg1.sub1 // sub-package sub1 under root package pkg1
Note:
In the current Windows platform version, package names do not support Unicode characters. Package names must be valid ordinary identifiers containing only ASCII characters.
The package declaration must appear as the first non-empty, non-comment line in a source file, and all source files within the same package must maintain consistent package declarations.
// file 1
// Comments are accepted
package test
// declarations...
// file 2
let a = 1 // Error, package declaration must appear first in a file
package test
// declarations...
In Cangjie, package names should reflect the relative path of the current source file from the project’s source root directory src, with path separators replaced by dots. For example, if the package’s source code is located under src/directory_0/directory_1 and the root package name is pkg, the package declaration in the source code should be package pkg.directory_0.directory_1.
Important considerations:
- The folder name containing the package must match the package name.
- The default name for the source root directory is
src. - Packages in the source root directory may omit package declarations, in which case the compiler will assign them the default package name
default.
Given the following source directory structure:
// The directory structure is as follows:
src
`-- directory_0
|-- directory_1
| |-- a.cj
| `-- b.cj
`-- c.cj
`-- main.cj
The package declarations in a.cj, b.cj, c.cj, and main.cj can be:
// a.cj
// in file a.cj, the declared package name must correspond to relative path directory_0/directory_1.
package default.directory_0.directory_1
// b.cj
// in file b.cj, the declared package name must correspond to relative path directory_0/directory_1.
package default.directory_0.directory_1
// c.cj
// in file c.cj, the declared package name must correspond to relative path directory_0.
package default.directory_0
// main.cj
// file main.cj is in the module root directory and may omit package declaration.
main(): Int64 {
return 0
}
Additionally, package declarations must not cause naming conflicts: sub-packages cannot share names with top-level declarations in the current package.
Here are some error examples:
// a.cj
package a
public class B { // Error, 'B' is conflicted with sub-package 'a.B'
public static func f() {}
}
// b.cj
package a.B
public func f {}
// main.cj
import a.B // ambiguous use of 'a.B'
main() {
a.B.f()
}
Visibility of Top-Level Declarations
In the Cangjie programming language, access modifiers can be used to control the visibility of top-level declarations such as types, variables, and functions. Cangjie provides four access modifiers: private, internal, protected, and public. The semantics of these modifiers when applied to top-level elements are as follows:
private: Visible only within the current file. Members with this modifier cannot be accessed from different files.internal: Visible only within the current package and its subpackages (including nested subpackages). Members can be accessed without import within the same package, while subpackages (including nested subpackages) can access these members through imports.protected: Visible only within the current module. Files in the same package can access these members without import, while other packages within the same module (but in different packages) can access them through imports. Packages from different modules cannot access these members.public: Visible both inside and outside the module. Files in the same package can access these members without import, while other packages can access them through imports.
| Modifier | File | Package & Subpackages | Module | All Packages |
|---|---|---|---|---|
private | Y | N | N | N |
internal | Y | Y | N | N |
protected | Y | Y | Y | N |
public | Y | Y | Y | Y |
Different top-level declarations support specific access modifiers and have default modifiers (default modifiers apply when the modifier is omitted; these defaults can also be explicitly specified):
package: Supportsinternal,protected, andpublic, withpublicas the default modifier.import: Supports all access modifiers, withprivateas the default modifier.- Other top-level declarations support all access modifiers, with
internalas the default modifier.
package a
private func f1() { 1 } // f1 is visible only within the current file
func f2() { 2 } // f2 is visible only within the current package and subpackages
protected func f3() { 3 } // f3 is visible only within the current module
public func f4() { 4 } // f4 is visible both inside and outside the module
The access level hierarchy in Cangjie is public > protected > internal > private. The access modifier of a declaration cannot be higher than the access level of the types used within that declaration. Refer to the following examples:
-
Parameters and return values in function declarations
// a.cj package a class C {} public func f1(a1: C) // Error, public declaration f1 cannot use internal type C. { return 0 } public func f2(a1: Int8): C // Error, public declaration f2 cannot use internal type C. { return C() } public func f3 (a1: Int8) // Error, public declaration f3 cannot use internal type C. { return C() } -
Variable declarations
// a.cj package a class C {} public let v1: C = C() // Error, public declaration v1 cannot use internal type C. public let v2 = C() // Error, public declaration v2 cannot use internal type C. -
Type arguments for generic types
// a.cj package a public class C1<T> {} class C2 {} public let v1 = C1<C2>() // Error, public declaration v1 cannot use internal type C2. -
Type bounds in
whereconstraints// a.cj package a interface I {} public class B<T> where T <: I {} // Error, public declaration B cannot use internal type I.
Notably:
-
publicdeclarations can use any types visible within their package in their initialization expressions or function bodies, including bothpublicand non-publictypes.// a.cj package a class C1 {} func f1(a1: C1) { return 0 } public func f2(a1: Int8) // OK. { var v1 = C1() return 0 } public let v1 = f1(C1()) // OK. public class C2 // OK. { var v2 = C1() } -
publictop-level declarations can use anonymous functions or any top-level functions, including bothpublicand non-publictop-level functions.public var t1: () -> Unit = { => } // OK. func f1(): Unit {} public let t2 = f1 // OK. public func f2() // OK. { return f1 } -
Built-in types such as
RuneandInt64default topublicvisibility.var num = 5 public var t3 = num // OK.
Note:
Within the same package,
privatecustom types (e.g.,struct,class,enum, andinterface) with the same name are not supported in certain scenarios. Unsupported cases will trigger compiler errors.
For example, in the following program, example1.cj and example2.cj share the same package name. example1.cj defines a private class A, while example2.cj defines a private struct A.
// example1.cj
package test
private class A {}
public class D<T> {
private let a: A = A()
}
// example2.cj
package test
private struct A {}
public class C<T> {
private let a: A = A()
}
Running this program will output:
error: currently, it is not possible to export two private declarations with the same name
Package Import
Using import Statements to Import Declarations or Definitions from Other Packages
In the Cangjie programming language, you can import a top-level declaration or definition from another package using the syntax import fullPackageName.itemName, where fullPackageName is the fully qualified package name and itemName is the name of the declaration. The import statements must be placed after the package declaration and before any other declarations or definitions in the source file. For example:
package a
import std.math.*
import package1.foo
import {package1.foo, package2.bar}
If multiple itemNames belong to the same fullPackageName, you can use the syntax import fullPackageName.{itemName[, itemName]*}. For example:
import package1.{foo, bar, fuzz}
This is equivalent to:
import package1.foo
import package1.bar
import package1.fuzz
In addition to importing a specific top-level declaration or definition using import fullPackagename.itemName, you can also use import packageName.* to import all visible top-level declarations or definitions from the packageName package. For example:
import package1.*
import {package1.*, package2.*}
Note the following:
- The scope level of imported members is lower than that of members declared in the current package.
- If the module name or package name of an exported package is altered, making it inconsistent with the name specified during export, an error will occur during import.
- Only top-level declarations or definitions visible to the current file can be imported. Attempting to import invisible declarations or definitions will result in an error at the import statement.
- It is prohibited to import declarations or definitions from the package where the current source file resides using
import. - Circular dependency imports between packages are prohibited. If circular dependencies exist between packages, the compiler will report an error.
Example:
// pkga/a.cj
package pkga // Error, packages pkga pkgb are in circular dependencies.
import pkgb.*
class C {}
public struct R {}
// pkgb/b.cj
package pkgb
import pkga.*
// pkgc/c1.cj
package pkgc
import pkga.C // Error, 'C' is not accessible in package 'pkga'.
import pkga.R // OK, R is an external top-level declaration of package pkga.
import pkgc.f1 // Error, package 'pkgc' should not import itself.
public func f1() {}
// pkgc/c2.cj
package pkgc
func f2() {
/* OK, the imported declaration is visible to all source files of the same package
* and accessing import declaration by its name is supported.
*/
R()
// OK, accessing imported declaration by fully qualified name is supported.
pkga.R()
// OK, the declaration of current package can be accessed directly.
f1()
// OK, accessing declaration of current package by fully qualified name is supported.
pkgc.f1()
}
In the Cangjie programming language, if an imported declaration or definition has the same name as a top-level declaration or definition in the current package and does not constitute function overloading, the imported declaration or definition will be shadowed. If they do constitute function overloading, function resolution will follow the rules of function overloading during function calls.
// pkga/a.cj
package pkga
public struct R {} // R1
public func f(a: Int32) {} // f1
public func f(a: Bool) {} // f2
// pkgb/b.cj
package pkgb
import pkga.*
func f(a: Int32) {} // f3
struct R {} // R2
func bar() {
R() // OK, R2 shadows R1.
f(1) // OK, invoke f3 in current package.
f(true) // OK, invoke f2 in the imported package
}
Implicit Import of the core Package
Types such as String and Range can be used directly not because they are built-in types, but because the compiler automatically and implicitly imports all public declarations from the core package for the source code.
Using import as to Rename Imported Names
Different packages have separate namespaces, so top-level declarations with the same name may exist across different packages. When importing top-level declarations with the same name from different packages, you can use import packageName.name as newName to rename them and avoid conflicts. Even without name conflicts, you can still use import as to rename imported content. The rules for import as are as follows:
-
After renaming an imported declaration using
import as, only the new name can be used in the current package; the original name becomes unavailable. -
If the renamed name conflicts with other names in the top-level scope of the current package and all corresponding declarations are function types, they participate in function overloading; otherwise, a redefinition error is reported.
-
The syntax
import pkg as newPkgNameis supported to rename package names, resolving naming conflicts for packages with the same name in different modules.// a.cj package p1 public func f1() {}// d.cj package p2 public func f3() {}// b.cj package p1 public func f2() {}// c.cj package pkgc public func f1() {}// main.cj import p1 as A import p1 as B import p2.f3 as f // OK import pkgc.f1 as a import pkgc.f1 as b // OK func f(a: Int32) {} main() { A.f1() // OK, package name conflict is resolved by renaming package name. B.f2() // OK, package name conflict is resolved by renaming package name. p1.f1() // Error, the original package name cannot be used. a() // OK. b() // OK. pkgc.f1() // Error, the original name cannot be used. } -
If conflicting imported names are not renamed, no error is reported at the
importstatement. However, an error will occur at the usage site due to the inability to import a unique name. This can be resolved by defining aliases withimport asor importing the package as a namespace usingimport fullPackageName.// a.cj package p1 public class C {} // b.cj package p2 public class C {} // main1.cj package pkga import p1.C import p2.C main() { let _ = C() // Error } // main2.cj package pkgb import p1.C as C1 import p2.C as C2 main() { let _ = C1() // OK let _ = C2() // OK } // main3.cj package pkgc import p1 import p2 main() { let _ = p1.C() // OK let _ = p2.C() // OK }
Re-exporting an Imported Name
In large-scale projects with extensive functionality, this scenario is very common: package p2 heavily uses declarations imported from package p1. When package p3 imports p2 and uses its functionality, the declarations from p1 also need to be visible to p3. Requiring p3 to manually import all p1 declarations used by p2 would be overly cumbersome. Therefore, it is desirable to import p1 declarations used by p2 when p2 is imported.
In the Cangjie programming language, import can be modified with the access modifiers private, internal, protected, or public. Among these, import statements modified with public, protected, or internal can re-export the imported members (provided these members are not rendered unavailable in the current package due to name conflicts or shadowing). Other packages can directly import and use the re-exported content based on visibility without needing to import it from the original package.
private importmeans the imported content is only accessible within the current file.privateis the default modifier forimport; animportwithout an access modifier is equivalent toprivate import.internal importmeans the imported content is accessible within the current package and its subpackages (including subpackages of subpackages). Access from outside the current package requires an explicitimport.protected importmeans the imported content is accessible within the current module. Access from outside the current package requires an explicitimport.public importmeans the imported content is accessible externally. Access from outside the current package requires an explicitimport.
In the following example, b is a subpackage of a, and a re-exports the function f defined in b using public import.
package a
public import a.b.f
public let x = 0
internal package a.b
public func f() { 0 }
import a.f // OK
let _ = f() // OK
Note that packages cannot be re-exported: if the import statement imports a package, it cannot be modified with public, protected, or internal.
public import a.b // Error, cannot re-export package
Program Entry Point
The entry point of a Cangjie program is main. At the top level of the package in the root directory of the source files, there can be at most one main function.
When compiling a module into an executable, the compiler only searches for main at the top level of the root directory’s source files. If no main is found, the compiler will report an error. If main is found, the compiler will further verify its parameter and return types. Note that main cannot be modified by access modifiers. When a package is imported, any main defined within that package will not be imported.
The main function serving as the program entry point can either have no parameters or accept a parameter of type Array<String>. Its return type must be either Unit or an integer type.
Example of main without parameters:
// main.cj
main(): Int64 { // OK.
return 0
}
Example of main with Array<String> parameter:
// main.cj
main(args: Array<String>): Unit { // OK.
for (arg in args) {
println(arg)
}
}
After compiling with cjc main.cj, executing via command line: ./main Hello, World will produce the following output:
Hello,
World
Below are some incorrect examples:
// main.cj
main(): String { // Error, return type of 'main' is not 'Integer' or 'Unit'.
return ""
}
// main.cj
main(args: Array<Int8>): Int64 { // Error, 'main' cannot be defined with parameter whose type is not Array<String>.
return 0
}
// main.cj
// Error, multiple 'main's are found in source files.
main(args: Array<String>): Int32 {
return 0
}
main(): Int8 {
return 0
}
Defining Exceptions
Exceptions are a special category of errors that can be caught and handled by programmers, representing a series of abnormal behaviors that occur during program execution. Examples include array index out of bounds, division by zero, arithmetic overflow, illegal input, etc. To ensure system correctness and robustness, many software systems contain extensive code for error detection and handling.
Exceptions are not part of a program’s normal functionality. Once an exception occurs, the program must handle it immediately by transferring control from the normal execution flow to the exception handling section. The Cangjie programming language provides an exception handling mechanism to address various exceptional conditions that may arise during runtime.
In Cangjie, exception classes include Error and Exception:
- The
Errorclass describes internal runtime system errors and resource exhaustion errors in Cangjie. Applications should not throw this type of error. If internal errors occur, they should only be reported to users, and the program should attempt to terminate safely. - The
Exceptionclass describes runtime logical errors or IO errors that cause exceptions, such as array index out of bounds or attempting to open a non-existent file. These exceptions need to be caught and handled within the program.
Developers cannot define custom exceptions by inheriting from Cangjie’s built-in Error class or its subclasses. However, they can inherit from the built-in Exception class or its subclasses to create custom exceptions, for example:
open class FatherException <: Exception {
public init() {
super("This is FatherException.")
}
public init(message: String) {
super(message)
}
public open override func getClassName(): String {
"FatherException"
}
}
class ChildException <: FatherException {
public init() {
super("This is ChildException.")
}
public open override func getClassName(): String {
"ChildException"
}
}
The following table shows the main functions of Exception and their descriptions:
| Function Type | Function and Description |
|---|---|
| Constructor | init() Default constructor. |
| Constructor | init(causedBy: Exception) Constructor that allows setting a cause exception. |
| Constructor | init(message: String) Constructor that allows setting an exception message. |
| Constructor | init(message: String, causedBy: Exception) Constructor that allows setting an exception message and a cause exception. |
| Property | mut prop causedBy: ?Exception Allows setting a cause exception and returns the cause exception. |
| Property | open prop message: String Returns detailed information about the exception. The message is initialized in the exception class constructor and defaults to an empty string. |
| Method | open func toString(): String Returns the exception type name and detailed information, where the detailed information defaults to using the message property. |
| Method | func getClassName(): String Returns the user-defined class name. Subclasses need to override this method to return their own names. |
| Method | func printStackTrace(): Unit Prints stack trace information to the standard error stream. |
The following table shows the main functions of Error and their descriptions:
| Function Type | Function and Description |
|---|---|
| Property | open prop message: String Returns detailed information about the error. The message is internally initialized when the error occurs and defaults to an empty string. |
| Method | open func toString(): String Returns the error type name and detailed information, where the detailed information defaults to using the message property. |
| Method | func printStackTrace(): Unit Prints stack trace information to the standard error stream. |
throw and Exception Handling
The previous section discussed how to define custom exceptions. Now let’s learn how to throw and handle exceptions.
-
Since exceptions are of
classtype, you only need to create them in the same way as class objects. For example, the expressionFatherException()creates an exception of typeFatherException. -
The Cangjie language supports creating exception chains (only for
Exception, not includingError), such as the expressionlet fatherException = FatherException("this is message", causeException), wherecauseExceptionis another exception and serves as the cause that triggeredfatherException. When printing the exception stack trace, it will recursively print the cause. -
The Cangjie language provides the
throwkeyword for throwing exceptions. When usingthrow, the expression following it must be a subtype ofException(note thatError, though also an exception type, cannot be manually thrown viathrow). For example,throw ArithmeticException("I am an Exception!")will throw an arithmetic exception when executed. -
Exceptions thrown via the
throwkeyword must be caught and handled. If an exception is not caught, the system will invoke the default exception handler.Note:
Developers can call the following static functions of the Thread class to register a custom exception handler for uncaught Exception:
public static func handleUncaughtExceptionBy(exHandler: (Thread, Exception) -> Unit): Unit
Exception handling is performed using try expressions, which come in two forms:
- Regular try expressions without automatic resource management.
- Try-with-resources expressions that perform automatic resource management.
Regular try Expressions
A regular try expression consists of three parts: a try block, catch blocks, and a finally block.
-
Try block: Begins with the
trykeyword followed by a block of expressions and declarations (enclosed in curly braces, defining a new local scope that can contain any expressions or declarations, hereafter referred to as “block”). The block followingtrymay throw exceptions, which can be caught and handled by subsequent catch blocks (if no catch block exists or the exception is uncaught, the exception continues to propagate after executing the finally block). -
Catch blocks: A regular try expression may contain zero or more catch blocks (when no catch block exists, a finally block must be present). Each catch block starts with the
catchkeyword, followed by acatchPatternand a block. ThecatchPatternmatches the exception to be caught via pattern matching. Once matched, the exception is handled by the subsequent block, and any remaining catch blocks are ignored. If a catch block’s exception type can be caught by a preceding catch block, a “catch block unreachable” warning will be issued for that catch block. -
Finally block: Begins with the
finallykeyword followed by a block. The finally block is primarily used for cleanup tasks, such as releasing resources, and should avoid throwing further exceptions. The finally block executes regardless of whether an exception occurs (i.e., whether the try block throws an exception). If the exception remains unhandled, it continues to propagate after the finally block executes. A try expression may omit the finally block only if it includes catch blocks; otherwise, the finally block is mandatory.
The scopes of the block following try and each catch block are independent.
Here is a simple example with only a try block and a catch block:
main() {
try {
throw NegativeArraySizeException("I am an Exception!")
} catch (e: NegativeArraySizeException) {
println(e)
println("NegativeArraySizeException is caught!")
}
println("This will also be printed!")
}
Execution result:
NegativeArraySizeException: I am an Exception!
NegativeArraySizeException is caught!
This will also be printed!
Variables introduced in catchPattern have the same scope level as variables in the block following catch. Redefining the same variable name in the catch block will trigger a redefinition error. For example:
main() {
try {
throw NegativeArraySizeException("I am an Exception!")
} catch (e: NegativeArraySizeException) {
println(e)
let e = 0 // Error, redefinition
println(e)
println("NegativeArraySizeException is caught!")
}
println("This will also be printed!")
}
Below is a simple example of a try expression with a finally block:
main() {
try {
throw NegativeArraySizeException("NegativeArraySizeException")
} catch (e: NegativeArraySizeException) {
println("Exception info: ${e}.")
} finally {
println("The finally block is executed.")
}
}
Execution result:
Exception info: NegativeArraySizeException: NegativeArraySizeException.
The finally block is executed.
Try expressions can appear anywhere expressions are allowed. The type of a try expression is determined similarly to multi-branch constructs like if and match expressions: it is the least common supertype of all branch types (excluding the finally branch). For example, in the following code, the try expression and variable x both have type D, the least common supertype of E and D. The C() in the finally branch does not participate in the least common supertype calculation (if it did, the least common supertype would become C).
Additionally, when the value of a try expression is unused, its type is Unit, and the branches are not required to have a least common supertype.
open class C {}
open class D <: C {}
class E <: D {}
main () {
let x = try {
E()
} catch (e: Exception) {
D()
} finally {
C()
}
0
}
Try-with-resources Expressions
Try-with-resources expressions are primarily used for automatic release of non-memory resources. Unlike regular try expressions, catch and finally blocks are optional in try-with-resources expressions. Additionally, between the try keyword and the block, one or more ResourceSpecification clauses can be inserted to acquire resources (the ResourceSpecification does not affect the type of the try expression). In the context of the language, resources correspond to objects, so ResourceSpecification essentially instantiates a series of objects (multiple instantiations are separated by “,”). An example of using try-with-resources is shown below:
class Worker <: Resource {
var hasTools: Bool = false
let name: String
public init(name: String) {
this.name = name
}
public func getTools() {
println("${name} picks up tools from the warehouse.")
hasTools = true
}
public func work() {
if (hasTools) {
println("${name} does some work with tools.")
} else {
println("${name} doesn't have tools, does nothing.")
}
}
public func isClosed(): Bool {
if (hasTools) {
println("${name} hasn't returned the tool.")
false
} else {
println("${name} has no tools")
true
}
}
public func close(): Unit {
println("${name} returns the tools to the warehouse.")
hasTools = false
}
}
main() {
try (r = Worker("Tom")) {
r.getTools()
r.work()
}
try (r = Worker("Bob")) {
r.work()
}
try (r = Worker("Jack")) {
r.getTools()
throw Exception("Jack left, because of an emergency.")
}
}
Program output:
Tom picks up tools from the warehouse.
Tom does some work with tools.
Tom hasn't returned the tool.
Tom returns the tools to the warehouse.
Bob doesn't have tools, does nothing.
Bob has no tools
Jack picks up tools from the warehouse.
Jack hasn't returned the tool.
Jack returns the tools to the warehouse.
An exception has occurred:
Exception: Jack left, because of an emergency.
at test.main()(xxx/xx.cj:xx)
Variables introduced between the try keyword and {} have the same scope level as variables introduced within {}. Redefining the same name within {} will trigger a redefinition error.
class R <: Resource {
public func isClosed(): Bool {
true
}
public func close(): Unit {
print("R is closed")
}
}
main() {
try (r = R()) {
println("Get the resource")
let r = 0 // Error, redefinition
println(r)
}
}
The types in ResourceSpecification of a try-with-resources expression must implement the Resource interface:
interface Resource {
func isClosed(): Bool // Determines whether the `close` function should be called to release resources when exiting the try-with-resources scope.
func close(): Unit // Releases resources when `isClosed` returns false.
}
It is worth noting that try-with-resources expressions generally do not need to include catch or finally blocks, nor is it recommended for developers to manually release resources (redundant logic). However, if you need to explicitly catch and handle exceptions that may occur during the try block or resource acquisition/release, you can still include catch and finally blocks in try-with-resources expressions:
class R <: Resource {
public func isClosed(): Bool {
true
}
public func close(): Unit {
print("R is closed")
}
}
main() {
try (r = R()) {
println("Get the resource")
} catch (e: Exception) {
println("Exception happened when executing the try-with-resources expression")
} finally {
println("End of the try-with-resources expression")
}
}
Program output:
Get the resource
End of the try-with-resources expression
The type of a try-with-resources expression is Unit.
Advanced CatchPattern Introduction
Most of the time, you only want to catch exceptions of a specific type and its subtypes. In such cases, use the type pattern of CatchPattern. However, sometimes you may need to handle all exceptions uniformly (e.g., when no exception should occur, and any exception triggers a uniform error message). In such cases, use the wildcard pattern of CatchPattern.
The type pattern has two syntactic forms:
Identifier: ExceptionClass: This form catches exceptions of typeExceptionClassand its subclasses. The caught exception instance is cast toExceptionClassand bound to the variable defined byIdentifier, which can then be used to access the exception instance in the catch block.Identifier: ExceptionClass_1 | ExceptionClass_2 | ... | ExceptionClass_n: This form concatenates multiple exception classes using the|operator, which represents an “or” relationship. It catches exceptions of typeExceptionClass_1or its subclasses, orExceptionClass_2or its subclasses, and so on (assuming n > 1). When the exception type matches any of these types or their subtypes, it is caught. However, since the exact type cannot be determined statically, the caught exception is cast to the least common superclass of all types connected by|and bound to the variable defined byIdentifier. Thus, in this mode, the catch block can only access members of the least common superclass via theIdentifiervariable. Alternatively, a wildcard can replace theIdentifierin the type pattern, with the only difference being that the wildcard does not perform binding.
Example:
main(): Int64 {
try {
throw IllegalArgumentException("This is an Exception!")
} catch (e: OverflowException) {
println(e.message)
println("OverflowException is caught!")
} catch (e: IllegalArgumentException | NegativeArraySizeException) {
println(e.message)
println("IllegalArgumentException or NegativeArraySizeException is caught!")
} finally {
println("finally is executed!")
}
return 0
}
Execution result:
This is an Exception!
IllegalArgumentException or NegativeArraySizeException is caught!
finally is executed!
Example demonstrating “the caught exception type is the least common superclass of all types connected by |”:
open class Father <: Exception {
var father: Int32 = 0
}
class ChildOne <: Father {
var childOne: Int32 = 1
}
class ChildTwo <: Father {
var childTwo: Int32 = 2
}
main() {
try {
throw ChildOne()
} catch (e: ChildTwo | ChildOne) {
println("${e is Father}")
}
}
Execution result:
true
The syntax for wildcard pattern is _, which can catch any type of exception thrown within the same-level try block. It is equivalent to the type pattern e: Exception, meaning it catches exceptions defined by subclasses of Exception. Example as follows:
// Catch with wildcardPattern.
try {
throw OverflowException()
} catch (_) {
println("catch an exception!")
}
Common Runtime Exceptions
The Cangjie language has built-in the most common exception classes that developers can directly use.
| Exception | Description |
|---|---|
ConcurrentModificationException | Exception caused by concurrent modification |
IllegalArgumentException | Exception thrown when passing illegal or incorrect arguments |
NegativeArraySizeException | Exception thrown when creating an array with negative size |
NoneValueException | Exception caused when a value does not exist, such as a key not found in a Map |
OverflowException | Arithmetic overflow exception |
Using Option
The Option type was introduced earlier, defining its characteristics. Since the Option type can represent both a value and the absence of a value (where the absence may be interpreted as an error in certain contexts), it can also be used for error handling.
For example, in the following case, if the parameter value of the function getOrThrow equals Some(v), it returns the value v; if the parameter equals None, it throws an exception.
func getOrThrow(a: ?Int64) {
match (a) {
case Some(v) => v
case None => throw NoneValueException()
}
}
Because Option is an extremely common type, Cangjie provides multiple destructuring methods to facilitate its usage, including: pattern matching, the getOrThrow function, the coalescing operator (??), and the question mark operator (?). Each of these methods will be explained in detail below.
-
Pattern Matching: Since the Option type is an enum type, the pattern matching for enums mentioned earlier can be used to destructure
Optionvalues. For example, in the following functiongetString, which takes a parameter of type?Int64, if the parameter is aSomevalue, it returns the string representation of the contained number; if the parameter isNone, it returns the string"none".func getString(p: ?Int64): String{ match (p) { case Some(x) => "${x}" case None => "none" } } main() { let a = Some(1) let b: ?Int64 = None let r1 = getString(a) let r2 = getString(b) println(r1) println(r2) }The execution result of the above code is:
1 none -
Coalescing Operator (
??): For an expressione1of type?T, if you want to return a valuee2of typeTwhene1equalsNone, you can use the??operator. For the expressione1 ?? e2, ife1equalsSome(v), it returnsv; otherwise, it returnse2. Example:main() { let a = Some(1) let b: ?Int64 = None let r1: Int64 = a ?? 0 let r2: Int64 = b ?? 0 println(r1) println(r2) }The execution result of the above code is:
1 0 -
Question Mark Operator (
?): The?operator must be used in conjunction with.,(),[], or{}(specifically in trailing lambda calls) to enable support for these operations onOptiontypes. Taking.as an example (similarly for(),[], and{}), for an expressioneof type?T1, wheneequalsSome(v), the value ofe?.bisOption<T2>.Some(v.b); otherwise, it isOption<T2>.None, whereT2is the type ofv.b. Example:struct R { public var a: Int64 public init(a: Int64) { this.a = a } } let r = R(100) let x = Some(r) let y = Option<R>.None let r1 = x?.a // r1 = Option<Int64>.Some(100) let r2 = y?.a // r2 = Option<Int64>.None class C { var item: Int64 = 100 } let c = C() let c1 = Option<C>.Some(c) let c2 = Option<C>.None func test1() { c1?.item = 200 // c.item = 200 c2?.item = 300 // no effect }The question mark operator (
?) supports multi-level access. For example, ina?.b.c?.d(similarly for(),[], and{}), the expressionamust be of typeOption<T1>, whereT1contains an instance memberb. The type ofbmust contain an instance memberc, andcmust be of typeOption<T2>, whereT2contains an instance memberd. The type ofa?.b.c?.disOption<T3>, whereT3is the type ofT2’s instance memberd. WhenaequalsSome(va)andva.b.cequalsSome(vc), the value ofa?.b.c?.disOption<T3>.Some(vc.d). WhenaequalsSome(va)butva.b.cequalsNone, the value isOption<T3>.None(dis not evaluated). WhenaequalsNone, the value isOption<T3>.None(b,c, anddare not evaluated).class A { public var b: B = B() } class B { public var c: Option<C> = C() public var c1: Option<C> = Option<C>.None } class C { public var d: Int64 = 100 } main(){ var a = Some(A()) let a1 = a?.b.c?.d // a1 = Option<Int64>.Some(100) let a2 = a?.b.c1?.d // a2 = Option<Int64>.None a?.b.c?.d = 200 // a.b.c.d = 200 a?.b.c1?.d = 200 // no effect } -
getOrThrowFunction: For an expressioneof type?T, you can destructure it by calling thegetOrThrowfunction. WheneequalsSome(v),getOrThrow()returnsv; otherwise, it throws an exception. Example:main() { let a = Some(1) let b: ?Int64 = None let r1 = a.getOrThrow() println(r1) try { let r2 = b.getOrThrow() } catch (e: NoneValueException) { println("b is None") } }The execution result of the above code is:
1 b is None
Concurrency Overview
Concurrency programming is an indispensable feature in modern programming languages. The Cangjie programming language provides a preemptive thread model as its concurrency mechanism. Threads can be categorized into two distinct concepts: language threads and native threads.
-
Language threads are the fundamental execution units in a programming language’s concurrency model. The Cangjie programming language aims to provide developers with a friendly, efficient, and unified concurrency programming interface, allowing them to focus on writing concurrent code without worrying about differences between operating system threads and user-mode threads. Therefore, it introduces the concept of Cangjie threads. In most cases, developers only need to write concurrent code targeting Cangjie threads.
-
Native threads refer to the threads used in the language implementation (typically operating system threads), which serve as the concrete carriers for language threads. Different programming languages implement language threads in various ways. For example, some languages directly create threads through operating system calls, meaning each language thread corresponds to a native thread. This implementation is generally referred to as the
1:1thread model. Alternatively, some languages provide specialized thread implementations that allow multiple language threads to switch execution across multiple native threads. This is known as theM:Nthread model, where M language threads are scheduled to execute on N native threads, with M and N not necessarily being equal. Currently, the Cangjie language implementation also adopts theM:Nthread model. Thus, Cangjie threads are essentially lightweight user-mode threads that support preemption and are more lightweight compared to operating system threads.
Cangjie threads are fundamentally lightweight user-mode threads. Each Cangjie thread is scheduled and executed by an underlying native thread, and multiple Cangjie threads can be executed by a single native thread. Each native thread continuously selects a ready Cangjie thread for execution. If a Cangjie thread blocks during execution (e.g., waiting for a mutex to be released), the native thread will suspend the current Cangjie thread and proceed to select the next ready one. A blocked Cangjie thread will resume execution once it becomes ready again.
In most cases, developers only need to focus on writing concurrent code for Cangjie threads without considering these details. However, during cross-language programming, developers must exercise caution when calling potentially blocking foreign functions, such as operating system calls related to I/O. For example, in the following code snippet, a new thread calls the foreign function socket_read. During program execution, a native thread will schedule and execute this Cangjie thread. Upon entering the foreign function, the system call will directly block the current native thread until the function completes. During this blocking period, the native thread cannot schedule other Cangjie threads for execution, which reduces the program’s throughput.
foreign socket_read(sock: Int64): CPointer<Int8>
let fut = spawn {
let sock: Int64 = ...
let ptr = socket_read(sock)
}
Note:
In this document, the term thread will be used as a shorthand for Cangjie thread when there is no ambiguity.
Creating Threads
When developers wish to execute a segment of code concurrently, they simply need to create a Cangjie thread. To create a new Cangjie thread, use the keyword spawn followed by a parameterless lambda expression, which represents the code to be executed in the new thread.
In the following example code, both the main thread and the new thread will attempt to print some text:
main(): Int64 {
spawn { =>
println("New thread before sleeping")
sleep(100 * Duration.millisecond) // sleep for 100ms.
println("New thread after sleeping")
}
println("Main thread")
return 0
}
In the above example, the new thread will terminate along with the main thread when it ends, regardless of whether the new thread has completed its execution. The output of the above example may vary slightly each time and could produce something similar to:
New thread before sleeping
Main thread
The sleep() function suspends the current thread for the specified duration before resuming execution, with the timing determined by the specified Duration type. For detailed information about sleep(), please refer to the Sleeping for a Specified Duration section.
Accessing Threads
Using Future<T> to Wait for Thread Completion and Retrieve Return Values
In the previous example, the newly created thread may terminate prematurely due to the main thread’s completion. Without proper sequencing guarantees, it’s even possible for the newly created thread to exit before getting a chance to execute. The return value of the spawn expression can be used to wait for thread completion.
The return type of the spawn expression is Future<T>, where T is a type parameter matching the return type of the lambda expression. When calling the get() member function of Future<T>, it will wait for the corresponding thread to complete execution.
The prototype declaration of Future<T> is as follows:
public class Future<T> {
// Blocking the current thread, waiting for the result of the thread corresponding to the current Future object.
// If an exception occurs in the corresponding thread, the method will throw the exception.
public func get(): T
// Blocking the current thread, waiting for the result of the thread corresponding to the current Future object.
// If the corresponding thread has not completed execution within Duration, the method will throws TimeoutException.
// If `timeout` <= Duration.Zero, its behavior is the same as `get()`.
public func get(timeout: Duration): T
// Non-blocking method that immediately returns Option<T>.None if thread has not finished execution.
// Returns the computed result otherwise.
// If an exception occurs in the corresponding thread, the method will throw the exception.
public func tryGet(): Option<T>
}
The following example demonstrates how to use Future<T> to wait for the newly created thread to complete execution within the main function:
main(): Int64 {
let fut: Future<Unit> = spawn { =>
println("New thread before sleeping")
sleep(100 * Duration.millisecond) // sleep for 100ms.
println("New thread after sleeping")
}
println("Main thread")
fut.get() // wait for the thread to finish.
return 0
}
Calling get() on a Future<T> instance blocks the currently running thread until the thread represented by the Future<T> instance completes execution. Therefore, the above example might produce output similar to:
New thread before sleeping
Main thread
New thread after sleeping
After printing, the main thread will wait for the newly created thread to complete due to the get() call. However, the printing order between the main thread and the new thread is nondeterministic.
If fut.get() is moved before the main thread’s print statement as shown below:
main(): Int64 {
let fut: Future<Unit> = spawn { =>
println("New thread before sleeping")
sleep(100 * Duration.millisecond) // sleep for 100ms.
println("New thread after sleeping")
}
fut.get() // wait for the thread to finish.
println("Main thread")
return 0
}
The main thread will wait for the newly created thread to complete before executing its print statement, making the program output deterministic as follows:
New thread before sleeping
New thread after sleeping
Main thread
This demonstrates how the placement of get() calls affects whether threads can run concurrently.
Beyond blocking for thread completion, Future<T> can also retrieve execution results. Below are its specific member functions:
-
get(): T: Blocks until thread completion and returns the execution result. If the thread has already completed, returns the result directly.Example code:
main(): Int64 { let fut: Future<Int64> = spawn { sleep(Duration.second) // sleep for 1s. return 1 } try { // wait for the thread to finish, and get the result. let res: Int64 = fut.get() println("result = ${res}") } catch (_) { println("oops") } return 0 }Output:
result = 1 -
get(timeout: Duration): T: Blocks until thread completion and returns the execution result. If the timeout period is reached before thread completion, throws TimeoutException. Whentimeout <= Duration.Zero, behaves identically toget().Example code:
main(): Int64 { let fut = spawn { sleep(Duration.second) // sleep for 1s. return 1 } // wait for the thread to finish, but only for 1ms. try { let res = fut.get(Duration.millisecond * 1) println("result: ${res}") } catch (_: TimeoutException) { println("oops") } return 0 }Output:
oops
Accessing Thread Properties
Each Future<T> object corresponds to a Cangjie thread represented by a Thread object. The Thread class primarily provides access to thread property information, such as thread identifiers. Note that Thread objects cannot be directly instantiated—they can only be obtained through the thread member property of Future<T> or via the static currentThread property of the Thread class to get the Thread object representing the currently executing thread.
Partial method definitions of the Thread class are shown below (complete method descriptions can be found in the Cangjie Programming Language Library API).
class Thread {
// Get the currently running thread
static prop currentThread: Thread
// Get the unique identifier (represented as an integer) of the thread object
prop id: Int64
// Check whether the thread has any cancellation request
prop hasPendingCancellation: Bool
}
The following example demonstrates retrieving thread identifiers through both methods after creating a new thread. Since both the main thread and new thread access the same Thread object, they print identical thread identifiers.
main(): Unit {
let fut = spawn {
println("Current thread id: ${Thread.currentThread.id}")
}
println("New thread id: ${fut.thread.id}")
fut.get()
}
Sample output (thread IDs may vary):
New thread id: 1
Current thread id: 1
Terminating Threads
The cancel() method of Future<T> can be used to send a termination request to the corresponding thread, but it does not forcibly stop thread execution. Developers need to check whether a termination request exists for the thread using the hasPendingCancellation property of Thread.
Generally, if a termination request exists for a thread, developers should implement appropriate thread termination logic. Therefore, how to terminate a thread is entirely up to the developer’s discretion. If the developer ignores the termination request, the thread will continue executing until it completes normally.
Example code:
import std.sync.SyncCounter
main(): Unit {
let syncCounter = SyncCounter(1)
let fut = spawn {
syncCounter.waitUntilZero() // block until the syncCounter becomes zero
if (Thread.currentThread.hasPendingCancellation) { // Check cancellation request
println("cancelled")
return
}
println("hello")
}
fut.cancel() // Send cancellation request
syncCounter.dec()
fut.get() // Join thread
}
Output:
cancelled
Synchronization Mechanisms
In concurrent programming, without proper synchronization mechanisms to protect variables shared among multiple threads, data race issues can easily occur.
The Cangjie programming language provides three common synchronization mechanisms to ensure thread safety of data: atomic operations, mutex locks, and condition variables.
Atomic Operations
Cangjie provides atomic operations for integer types, Bool type, and reference types.
The integer types include: Int8, Int16, Int32, Int64, UInt8, UInt16, UInt32, UInt64.
Atomic operations for integer types support basic read/write, exchange, and arithmetic operations:
| Operation | Functionality |
|---|---|
load | Read |
store | Write |
swap | Exchange, returns the value before exchange |
compareAndSwap | Compare-and-swap, returns true if successful, otherwise false |
fetchAdd | Addition, returns the value before addition |
fetchSub | Subtraction, returns the value before subtraction |
fetchAnd | Bitwise AND, returns the value before operation |
fetchOr | Bitwise OR, returns the value before operation |
fetchXor | Bitwise XOR, returns the value before operation |
Important notes:
- The return value of exchange and arithmetic operations is the value before modification.
compareAndSwapchecks if the current atomic variable’s value equals the old value; if equal, it replaces it with the new value; otherwise, no replacement occurs.
Taking Int8 as an example, the corresponding atomic operation type declaration is as follows:
class AtomicInt8 {
public func load(): Int8
public func store(val: Int8): Unit
public func swap(val: Int8): Int8
public func compareAndSwap(old: Int8, new: Int8): Bool
public func fetchAdd(val: Int8): Int8
public func fetchSub(val: Int8): Int8
public func fetchAnd(val: Int8): Int8
public func fetchOr(val: Int8): Int8
public func fetchXor(val: Int8): Int8
}
Each method of the atomic types mentioned above has a corresponding method that accepts memory ordering parameters. Currently, only sequential consistency is supported for memory ordering.
Similarly, other integer types have corresponding atomic operation types:
class AtomicInt16 {...}
class AtomicInt32 {...}
class AtomicInt64 {...}
class AtomicUInt8 {...}
class AtomicUInt16 {...}
class AtomicUInt32 {...}
class AtomicUInt64 {...}
The following example demonstrates how to use atomic operations to implement counting in a multi-threaded program:
import std.sync.AtomicInt64
import std.collection.ArrayList
let count = AtomicInt64(0)
main(): Int64 {
let list = ArrayList<Future<Int64>>()
// create 1000 threads.
for (_ in 0..1000) {
let fut = spawn {
sleep(Duration.millisecond) // sleep for 1ms.
count.fetchAdd(1)
}
list.add(fut)
}
// Wait for all threads finished.
for (f in list) {
f.get()
}
let val = count.load()
println("count = ${val}")
return 0
}
The expected output is:
count = 1000
Here are some other correct examples of using integer-type atomic operations:
var obj: AtomicInt32 = AtomicInt32(1)
var x = obj.load() // x: 1, the type is Int32
x = obj.swap(2) // x: 1
x = obj.load() // x: 2
var y = obj.compareAndSwap(2, 3) // y: true, the type is Bool.
y = obj.compareAndSwap(2, 3) // y: false, the value in obj is no longer 2 but 3. Therefore, the CAS operation fails.
x = obj.fetchAdd(1) // x: 3
x = obj.load() // x: 4
Atomic operations for Bool type and reference types only provide read/write and exchange operations:
| Operation | Functionality |
|---|---|
load | Read |
store | Write |
swap | Exchange, returns the value before exchange |
compareAndSwap | Compare-and-swap, returns true if successful, otherwise false |
Note:
Atomic reference operations are only valid for reference types.
The atomic reference type is AtomicReference. Here are some correct examples of using Bool type and reference-type atomic operations:
import std.sync.{AtomicBool, AtomicReference}
class A {}
main() {
var obj = AtomicBool(true)
var x1 = obj.load() // x1: true, the type is Bool
println(x1)
var t1 = A()
var obj2 = AtomicReference(t1)
var x2 = obj2.load() // x2 and t1 are the same object
var y1 = obj2.compareAndSwap(x2, t1) // x2 and t1 are the same object, y1: true
println(y1)
var t2 = A()
var y2 = obj2.compareAndSwap(t2, A()) // x and t1 are not the same object, CAS fails, y2: false
println(y2)
y2 = obj2.compareAndSwap(t1, A()) // CAS successes, y2: true
println(y2)
}
Compiling and executing the above code produces the following output:
true
true
false
true
Reentrant Mutex Lock
The reentrant mutex lock protects critical sections, ensuring that only one thread can execute the critical section code at any given time. When a thread attempts to acquire a lock held by another thread, it blocks until the lock is released. Reentrant means a thread can acquire the same lock multiple times.
When using a reentrant mutex lock, two rules must be strictly followed:
- Before accessing shared data, the lock must be acquired;
- After processing the shared data, the lock must be released to allow other threads to acquire it.
The main member functions provided by Mutex are as follows:
public class Mutex <: UniqueLock {
// Create a Mutex.
public init()
// Locks the mutex, blocks if the mutex is not available.
public func lock(): Unit
// Unlocks the mutex. If there are other threads blocking on this
// lock, then wake up one of them.
public func unlock(): Unit
// Tries to lock the mutex, returns false if the mutex is not
// available, otherwise returns true.
public func tryLock(): Bool
// Generate a Condition instance for the mutex.
public func condition(): Condition
}
The following example demonstrates how to use Mutex to protect access to the global shared variable count. Operations on count constitute the critical section:
import std.sync.Mutex
import std.collection.ArrayList
var count: Int64 = 0
let mtx = Mutex()
main(): Int64 {
let list = ArrayList<Future<Unit>>()
// create 1000 threads.
for (i in 0..1000) {
let fut = spawn {
sleep(Duration.millisecond) // sleep for 1ms.
mtx.lock()
count++
mtx.unlock()
}
list.add(fut)
}
// Wait for all threads finished.
for (f in list) {
f.get()
}
println("count = ${count}")
return 0
}
The expected output is:
count = 1000
The following example demonstrates how to use tryLock:
import std.sync.Mutex
main(): Int64 {
let mtx: Mutex = Mutex()
var future: Future<Unit> = spawn {
mtx.lock()
println("get the lock, do something")
sleep(Duration.millisecond * 10)
mtx.unlock()
}
try {
future.get(Duration.millisecond * 10)
} catch (e: TimeoutException) {
if (mtx.tryLock()) {
println("tryLock success, do something")
mtx.unlock()
return 0
}
println("tryLock failed, do nothing")
return 0
}
return 0
}
One possible output is:
get the lock, do something
Here are some incorrect examples of mutex usage:
Error Example 1: A thread fails to unlock after operating on the critical section, causing other threads to block while waiting for the lock.
import std.sync.Mutex
var sum: Int64 = 0
let mutex = Mutex()
main() {
let foo = spawn { =>
mutex.lock()
sum = sum + 1
}
let bar = spawn { =>
mutex.lock()
sum = sum + 1
}
foo.get()
println("${sum}")
bar.get() // Because the thread is not unlocked, other threads waiting to obtain the current mutex will be blocked.
}
Error Example 2: Calling unlock without holding the lock in the current thread will throw an exception.
import std.sync.Mutex
var sum: Int64 = 0
let mutex = Mutex()
main() {
let foo = spawn { =>
sum = sum + 1
mutex.unlock() // Error, Unlock without obtaining the lock and throw an exception: IllegalSynchronizationStateException.
}
foo.get()
}
Error Example 3: tryLock() does not guarantee acquiring the lock, which may lead to operations on critical sections without lock protection and exceptions when calling unlock without holding the lock.
import std.sync.Mutex
var sum: Int64 = 0
let mutex = Mutex()
main() {
for (i in 0..100) {
spawn { =>
mutex.tryLock() // Error, `tryLock()` just trying to acquire a lock, there is no guarantee that the lock will be acquired, and this can lead to abnormal behavior.
sum = sum + 1
mutex.unlock()
}
}
}
Additionally, Mutex is designed as a reentrant lock, meaning: when a thread already holds a Mutex lock, attempting to acquire the same Mutex lock again will always immediately succeed.
Note:
Although
Mutexis a reentrant lock, the number ofunlock()calls must match the number oflock()calls to successfully release the lock.
The following example demonstrates the reentrant property of Mutex:
import std.sync.Mutex
var count: Int64 = 0
let mtx = Mutex()
func foo() {
mtx.lock()
count += 10
bar()
mtx.unlock()
}
func bar() {
mtx.lock()
count += 100
mtx.unlock()
}
main(): Int64 {
let fut = spawn {
sleep(Duration.millisecond) // sleep for 1ms.
foo()
}
foo()
fut.get()
println("count = ${count}")
return 0
}
The output should be:
count = 220
In the above example, whether in the main thread or the newly created thread, if the lock is already acquired in foo(), then calling bar() will immediately acquire the same Mutex lock without causing deadlock.
Condition
Condition is a condition variable (i.e., wait queue) bound to a mutex lock. Condition instances are created by mutex locks, and a single mutex can create multiple Condition instances. Condition allows threads to block and wait for signals from other threads to resume execution. This is a thread synchronization mechanism using shared variables, providing the following main methods:
public class Mutex <: UniqueLock {
// ...
// Generate a Condition instance for the mutex.
public func condition(): Condition
}
public interface Condition {
// Wait for a signal, blocking the current thread.
func wait(): Unit
func wait(timeout!: Duration): Bool
// Wait for a signal and predicate, blocking the current thread.
func waitUntil(predicate: ()->Bool): Unit
func waitUntil(predicate: ()->Bool, timeout!: Duration): Bool
// Wake up one thread of those waiting on the monitor, if any.
func notify(): Unit
// Wake up all threads waiting on the monitor, if any.
func notifyAll(): Unit
}
Before calling wait, notify, or notifyAll methods of the Condition interface, ensure the current thread holds the bound lock. The wait method performs the following actions:
- Adds the current thread to the wait queue of the corresponding lock;
- Blocks the current thread while fully releasing the lock and recording the lock’s reentrant count;
- Waits for another thread to signal this thread using
notifyornotifyAllon the sameConditioninstance; - When awakened, the thread automatically attempts to reacquire the lock with the same reentrant state as recorded in step 2; if acquisition fails, the thread blocks on the lock.
The wait method accepts an optional timeout parameter. Note that many common operating systems do not guarantee real-time scheduling, so precise blocking for “exactly N nanoseconds” cannot be guaranteed—system-dependent inaccuracies may be observed. Additionally, the current language specification explicitly allows implementations to produce spurious wakeups—in such cases, the wait return value is implementation-dependent (either true or false). Therefore, developers are encouraged to always wrap wait in a loop:
synchronized (obj) {
while (<condition is not true>) {
obj.wait()
}
}
Below is a correct example of using Condition:
import std.sync.Mutex
let mtx = Mutex()
let condition = synchronized(mtx) {
mtx.condition()
}
var flag: Bool = true
main(): Int64 {
let fut = spawn {
mtx.lock()
while (flag) {
println("New thread: before wait")
condition.wait()
println("New thread: after wait")
}
mtx.unlock()
}
// Sleep for 10ms, to make sure the new thread can be executed.
sleep(10 * Duration.millisecond)
mtx.lock()
println("Main thread: set flag")
flag = false
mtx.unlock()
mtx.lock()
println("Main thread: notify")
condition.notifyAll()
mtx.unlock()
// wait for the new thread finished.
fut.get()
return 0
}
The output should be:
New thread: before wait
Main thread: set flag
Main thread: notify
New thread: after wait
When executing wait on a Condition object, it must be done under lock protection; otherwise, the unlock operation in wait will throw an exception.
Below are some error examples of using condition variables:
import std.sync.Mutex
let m1 = Mutex()
let c1 = synchronized(m1) {
m1.condition()
}
let m2 = Mutex()
var flag: Bool = true
var count: Int64 = 0
func foo1() {
spawn {
m2.lock()
while (flag) {
c1.wait() // Error:The lock used with the condition variable must be the same and in the locked state. Otherwise, the unlock operation in `wait` throws an exception.
}
count = count + 1
m2.unlock()
}
m1.lock()
flag = false
c1.notifyAll()
m1.unlock()
}
func foo2() {
spawn {
while (flag) {
c1.wait() // Error:`wait` must be called while holding the lock.
}
count = count + 1
}
m1.lock()
flag = false
c1.notifyAll()
m1.unlock()
}
main() {
foo1()
foo2()
c1.wait()
}
In complex thread synchronization scenarios, multiple Condition instances may need to be generated for the same lock object. The following example implements a fixed-length bounded FIFO queue. When the queue is empty, get() blocks; when the queue is full, put() blocks.
import std.sync.{Mutex, Condition}
class BoundedQueue {
// Create a Mutex, two Conditions.
let m: Mutex = Mutex()
var notFull: Condition
var notEmpty: Condition
var count: Int64 // Object count in buffer.
var head: Int64 // Write index.
var tail: Int64 // Read index.
// Queue's length is 100.
let items: Array<Object> = Array<Object>(100, {i => Object()})
init() {
count = 0
head = 0
tail = 0
synchronized(m) {
notFull = m.condition()
notEmpty = m.condition()
}
}
// Insert an object, if the queue is full, block the current thread.
public func put(x: Object) {
// Acquire the mutex.
synchronized(m) {
while (count == 100) {
// If the queue is full, wait for the "queue notFull" event.
notFull.wait()
}
items[head] = x
head++
if (head == 100) {
head = 0
}
count++
// An object has been inserted and the current queue is no longer
// empty, so wake up the thread previously blocked on get()
// because the queue was empty.
notEmpty.notify()
} // Release the mutex.
}
// Pop an object, if the queue is empty, block the current thread.
public func get(): Object {
// Acquire the mutex.
synchronized(m) {
while (count == 0) {
// If the queue is empty, wait for the "queue notEmpty" event.
notEmpty.wait()
}
let x: Object = items[tail]
tail++
if (tail == 100) {
tail = 0
}
count--
// An object has been popped and the current queue is no longer
// full, so wake up the thread previously blocked on put()
// because the queue was full.
notFull.notify()
return x
} // Release the mutex.
}
}
The synchronized Keyword
The Lock provides a convenient and flexible way for locking operations. However, due to its flexibility, it may lead to issues such as forgetting to unlock or failing to automatically release held locks when exceptions occur while holding the lock. Therefore, the Cangjie programming language provides a synchronized keyword to be used with Lock, which automatically performs locking and unlocking operations within its following scope to address such problems.
The following example code demonstrates how to use the synchronized keyword to protect shared data:
import std.sync.Mutex
import std.collection.ArrayList
var count: Int64 = 0
let mtx = Mutex()
main(): Int64 {
let list = ArrayList<Future<Unit>>()
// create 1000 threads.
for (i in 0..1000) {
let fut = spawn {
sleep(Duration.millisecond) // sleep for 1ms.
// Use synchronized(mtx), instead of mtx.lock() and mtx.unlock().
synchronized(mtx) {
count++
}
}
list.add(fut)
}
// Wait for all threads finished.
for (f in list) {
f.get()
}
println("count = ${count}")
return 0
}
The expected output should be:
count = 1000
By adding a Lock instance after synchronized, the code block it modifies is protected, ensuring that at most one thread can execute the protected code at any given time:
- Before entering the
synchronizedcode block, a thread automatically acquires the lock corresponding to theLockinstance. If the lock cannot be acquired, the current thread is blocked; - Before exiting the
synchronizedcode block, a thread automatically releases the lock of theLockinstance.
For control transfer expressions (such as break, continue, return, throw), when they cause the program execution to exit the synchronized code block, they also comply with point 2 above, meaning the lock corresponding to the synchronized expression is automatically released.
The following example demonstrates the case where a break statement appears within a synchronized code block:
import std.sync.Mutex
import std.collection.ArrayList
var count: Int64 = 0
var mtx: Mutex = Mutex()
main(): Int64 {
let list = ArrayList<Future<Unit>>()
for (i in 0..10) {
let fut = spawn {
while (true) {
synchronized(mtx) {
count = count + 1
break
println("in thread")
}
}
}
list.add(fut)
}
// Wait for all threads finished.
for (f in list) {
f.get()
}
synchronized(mtx) {
println("in main, count = ${count}")
}
return 0
}
The expected output should be:
in main, count = 10
In reality, the line in thread will not be printed because the break statement causes the program execution to exit the while loop (before exiting the while loop, it first exits the synchronized code block).
Thread-Local Variables (ThreadLocal)
Using the ThreadLocal from the core package, you can create and use thread-local variables. Each thread has its own independent storage space to hold these thread-local variables. Therefore, each thread can safely access its own thread-local variables without being affected by other threads.
public class ThreadLocal<T> {
/* Construct a Cangjie thread-local variable carrying a null value */
public init()
/* Get the value of the Cangjie thread-local variable */
public func get(): Option<T> // Returns Option<T>.None if the value does not exist. Return value Option<T> - the value of the Cangjie thread-local variable.
/* Set the value of the Cangjie thread-local variable via 'value' */
public func set(value: Option<T>): Unit // If Option<T>.None is passed, the value of the local variable will be deleted and cannot be retrieved in subsequent thread operations. Parameter value - the value to set for the local variable.
}
The following example code demonstrates how to use the ThreadLocal class to create and use thread-local variables for each thread:
main(): Int64 {
let tl = ThreadLocal<Int64>()
let fut1 = spawn {
tl.set(123)
println("tl in spawn1 = ${tl.get().getOrThrow()}")
}
let fut2 = spawn {
tl.set(456)
println("tl in spawn2 = ${tl.get().getOrThrow()}")
}
fut1.get()
fut2.get()
0
}
Possible output results are as follows:
tl in spawn1 = 123
tl in spawn2 = 456
Or:
tl in spawn2 = 456
tl in spawn1 = 123
Thread Sleep for Specified Duration
The sleep function blocks the currently running thread, causing it to voluntarily sleep for a specified duration before resuming execution. Its parameter type is Duration. The function prototype is:
func sleep(dur: Duration): Unit // Sleep for at least `dur`.
Note:
If
dur<= Duration.Zero, the current thread will only yield execution resources without entering sleep mode.
Below is an example of using sleep:
main(): Int64 {
println("Hello")
sleep(Duration.second) // sleep for 1s.
println("World")
return 0
}
The output is as follows:
Hello
World
Overview of I/O Streams
This chapter introduces fundamental I/O concepts and file operations.
In the Cangjie programming language, operations that interact with external carriers of applications are referred to as I/O operations. “I” stands for Input, and “O” stands for Output.
All I/O mechanisms in Cangjie are based on data streams for input and output, where these streams represent sequences of byte data. A data stream is a continuous collection of data, functioning like a pipeline that carries data—data is input at one end of the pipeline and output at the other.
The Cangjie programming language abstracts input and output as Streams.
- Reading data from external storage into memory is called an input stream (InputStream). The input end can write data into the pipeline segment by segment, and these segments form a long data stream in sequence.
- Writing data from memory to external storage is called an output stream (OutputStream). The output end can also read data from the pipeline segment by segment, where any length of data can be read each time (without needing to match the input end), but only earlier input data can be read before later input data.
With this layer of abstraction, the Cangjie programming language can use a unified interface to interact with external data.
The Cangjie programming language describes operations such as standard input/output, file operations, network data streams, string streams, encryption streams, compression streams, etc., uniformly using Stream.
Stream primarily deals with raw binary data, and the smallest data unit in a Stream is Byte.
The Cangjie programming language defines Stream as an interface, allowing different Streams to be combined using the decorator pattern, significantly enhancing extensibility.
Input Stream
A program reads data sources (including external devices like keyboards, files, networks, etc.) from an input stream. In other words, an input stream is a communication channel that reads data sources into the program.
The Cangjie programming language uses the InputStream interface type to represent an input stream. It provides a read function, which writes readable data into a buffer and returns the total number of bytes read in that operation.
Definition of the InputStream interface:
interface InputStream {
func read(buffer: Array<Byte>): Int64
}
When an input stream is available, byte data can be read as shown in the following code. The read data is written into the input parameter array of read.
Example of reading from an input stream:
import std.io.InputStream
main() {
let input: InputStream = ...
let buf = Array<Byte>(256, repeat: 0)
while (input.read(buf) > 0) {
println(buf)
}
}
Output Stream
A program writes data to an output stream. An output stream is a communication channel that outputs data from the program to external devices (such as displays, printers, files, networks, etc.).
The Cangjie programming language uses the OutputStream interface type to represent an output stream. It provides a write function, which writes data from the buffer into the bound stream.
Notably, some output streams’ write operations do not immediately write to external storage but follow certain buffering strategies. Data is only physically written when certain conditions are met or when flush is explicitly called, aiming to improve performance.
To uniformly handle these flush operations, OutputStream includes a default implementation of flush, which helps smooth out API call differences.
Definition of the OutputStream interface:
interface OutputStream {
func write(buffer: Array<Byte>): Unit
func flush(): Unit {
// Empty implementation
}
}
When an output stream is available, byte data can be written.
Example of writing to an output stream:
import std.io.OutputStream
main() {
let output: OutputStream = ...
let buf = Array<Byte>(256, repeat: 111)
output.write(buf)
output.flush()
}
Classification of Data Streams
Based on their functional differences, Streams can be broadly categorized into two types:
- Node Streams: Directly provide data sources. Node streams are typically constructed by relying on direct external resources (such as files, networks, etc.).
- Processing Streams: Can only delegate other data streams for processing. Processing streams are usually constructed by depending on other streams.
I/O Node Streams
Node streams refer to streams that directly provide data sources. The construction of node streams typically relies on some direct external resource (such as files, networks, etc.).
Common node streams in the Cangjie programming language include standard streams (ConsoleReader, ConsoleWriter), file streams (File), network streams (Socket), etc.
This chapter introduces standard streams and file streams.
Standard Streams
Standard streams include standard input stream, standard output stream, and standard error output stream.
Standard streams serve as the standard interface for programs to interact with external data. During program execution, data is read from the input stream as program input, output information is transmitted to the output stream, and similarly, error information is sent to the error stream.
In the Cangjie programming language, the global functions getStdIn, getStdOut, and getStdErr can be used to obtain these three standard streams respectively.
To use these functions, the env package needs to be imported:
Example of importing the env package:
import std.env.*
The ConsoleReader and ConsoleWriter types in the env package provide user-friendly wrappers for these three standard streams (since the standard error output stream is also an output stream, there are two types in total). They offer more convenient String-based extended operations and provide rich overloads for many common types to optimize performance.
Most importantly, ConsoleReader and ConsoleWriter types guarantee thread safety, allowing safe reading and writing through their interfaces in any thread.
By default, the standard input stream comes from keyboard input, such as text entered in a command-line interface.
When data needs to be obtained from the standard input stream, the ConsoleReader type can be acquired via the getStdIn global function, and then the readln function of this type can be used to get command-line input.
Example of reading from the standard input stream:
import std.env.getStdIn
main() {
let consoleReader = getStdIn()
let txt = consoleReader.readln()
println(txt ?? "")
}
Run the above code, enter some text in the command line, and then press Enter to see the input content.
The output stream is divided into standard output stream and standard error stream. By default, both output to the screen, such as the text seen in the command-line interface.
When data needs to be written to the standard output stream, the ConsoleWriter can be obtained via the getStdOut/getStdErr global functions to write data, such as using the write function to print content to the console.
The difference between using ConsoleWriter and directly using the print function is that ConsoleWriter is thread-safe and, due to its caching technology, offers better performance when dealing with large amounts of content.
Note that after writing data, flush must be called on ConsoleWriter to ensure the content is completely written to the standard stream.
Example of writing to the standard output stream:
import std.env.getStdOut
main() {
let consoleWriter = getStdOut()
for (_ in 0..1000) {
consoleWriter.writeln("hello, world!")
}
consoleWriter.flush()
}
File Streams
The Cangjie programming language provides the fs package to support general file system tasks. Different operating systems offer varying interfaces for file systems. The Cangjie programming language abstracts the following common functionalities, providing unified interfaces to mask differences between operating systems and simplify usage.
Common operations include: creating files/directories, reading/writing files, renaming or moving files/directories, deleting files/directories, copying files/directories, obtaining file/directory metadata, and checking file/directory existence. Specific APIs can be found in the library documentation.
To use file system-related functionalities, the fs package must be imported:
Example of importing the fs package:
import std.fs.*
This chapter focuses on the usage of File. For Path and Directory usage, please refer to the corresponding API documentation.
The File type in the Cangjie programming language provides both conventional file operations and file stream functionalities.
Conventional File Operations
For conventional file operations, a series of static functions can be used to perform quick operations.
For example, to check if a file exists at a certain path, the exists function can be used. When exists returns true, it indicates the file exists; otherwise, it does not.
Example of using the exists function:
import std.fs.exists
main() {
let exist = exists("./tempFile.txt")
println("exist: ${exist}")
}
Moving, copying, and deleting files are also straightforward. The File type provides corresponding static functions: move, copy, and delete.
Example of using move, copy, and delete functions:
import std.fs.{copy, rename, remove}
main() {
copy("./tempFile.txt", to: "./tempFile2.txt", overwrite: false)
rename("./tempFile2.txt", to: "./tempFile3.txt", overwrite: false)
remove("./tempFile3.txt")
}
If all data from a file needs to be read at once or data needs to be written to a file in one go, the readFrom and writeTo functions provided by File can be used. For small amounts of data, these functions are both simple to use and offer good performance without manual stream handling.
Example of using readFrom and writeTo functions:
import std.fs.File
main() {
let bytes = File.readFrom("./tempFile.txt") // Reads all data at once
File.writeTo("./otherFile.txt", bytes) // Writes all data to another file at once
}
File Stream Operations
In addition to the conventional file operations mentioned above, the File type is also designed as a data stream type. Therefore, the File type implements the IOStream interface. When a File instance is created, it can be used as a data stream.
Definition of the File class:
public class File <: Resource & IOStream & Seekable {
...
}
File provides two construction methods: one uses the convenient static function create to directly create a new file instance, and the other uses a constructor with a complete file opening mode to create a new instance.
Files created with create are write-only; read operations on such instances will throw a runtime exception.
Example of File construction:
// Create
internal import std.fs.*
internal import std.io.*
main() {
let file = File.create("./tempFile.txt")
file.write("hello, world!".toArray())
// Open
let file2 = File("./tempFile.txt", Read)
let bytes = readToEnd(file2) // Reads all data
println(bytes)
}
When more precise opening modes are needed, a constructor can be used with an OpenMode value. OpenMode is an enum type that provides rich file opening modes, including Read, Write, Append, and ReadWrite.
Example of using File opening modes:
// Open with specified options
let file = File("./tempFile.txt", Write)
Since opening a File instance consumes valuable system resources, it is important to close the File promptly after use to release system resources.
File implements the Resource interface, so in most cases, the try-with-resource syntax can be used to simplify operations.
Example of using try-with-resource syntax:
try (file2 = File("./tempFile.txt", Read)) {
...
// Automatically releases the file after use
}
I/O Processing Streams
Processing streams refer to streams that act as intermediaries for processing other data streams.
Common processing streams in the Cangjie programming language include BufferedInputStream, BufferedOutputStream, StringReader, StringWriter, ChainedInputStream, etc.
This chapter introduces buffered streams and string streams.
Buffered Streams
Since disk I/O operations are significantly slower than memory I/O operations, for high-frequency, small-data read/write operations, unbuffered data streams are highly inefficient—each read and write operation incurs substantial I/O overhead. Buffered data streams, however, can perform multiple read/write operations without triggering disk I/O; instead, data is temporarily stored in memory. Only when the buffer reaches its capacity is the data written to disk in a single operation. This approach dramatically reduces the number of disk operations, thereby improving performance.
The Cangjie programming language standard library provides the BufferedInputStream and BufferedOutputStream types to offer buffering functionality.
To use BufferedInputStream and BufferedOutputStream, the io package must be imported.
Example of importing the io package:
import std.io.*
The BufferedInputStream adds buffering capabilities to another input stream. Essentially, BufferedInputStream is implemented using an internal buffer array.
When reading data through BufferedInputStream, it reads an entire buffer’s worth of data at once. Subsequent read operations can then retrieve smaller chunks of data. When the buffer is exhausted, the input stream refills it. This process repeats until all data in the stream is read.
To construct a BufferedInputStream, simply pass another input stream to its constructor. To specify the buffer size, an additional capacity parameter can be provided.
Example of constructing a BufferedInputStream:
import std.io.{ByteBuffer, BufferedInputStream}
main(): Unit {
let arr1 = "0123456789".toArray()
let byteBuffer = ByteBuffer()
byteBuffer.write(arr1)
let bufferedInputStream = BufferedInputStream(byteBuffer)
let arr2 = Array<Byte>(20, repeat: 0)
/* Reads data from the stream and returns the length of the data read */
let readLen = bufferedInputStream.read(arr2)
println(String.fromUtf8(arr2[..readLen])) // 0123456789
}
The BufferedOutputStream adds buffering capabilities to another output stream. It is also implemented using an internal buffer array.
When writing data through BufferedOutputStream, the write operations first fill the internal buffer. Once the buffer is full, BufferedOutputStream writes the entire buffer’s contents to the output stream in one operation and then clears the buffer for subsequent writes. This process repeats until all data is written.
Note: Since writing data that doesn’t fill the buffer won’t trigger an output stream write operation, after completing all writes to BufferedOutputStream, an additional flush call is required to finalize the writes.
To construct a BufferedOutputStream, pass another output stream to its constructor. To specify the buffer size, an additional capacity parameter can be provided.
Example of constructing a BufferedOutputStream:
import std.io.{ByteBuffer, BufferedOutputStream, readToEnd}
main(): Unit {
let arr1 = "01234".toArray()
let byteBuffer = ByteBuffer()
byteBuffer.write(arr1)
let bufferedOutputStream = BufferedOutputStream(byteBuffer)
let arr2 = "56789".toArray()
/* Writes data to the stream; the data remains in the external stream's buffer */
bufferedOutputStream.write(arr2)
/* Calls the flush function to finally write the data to the internal stream */
bufferedOutputStream.flush()
println(String.fromUtf8(readToEnd(byteBuffer))) // 0123456789
}
String Streams
Since the input and output streams in the Cangjie programming language are byte-based (for better performance), they can be less user-friendly in scenarios primarily involving strings, such as writing large amounts of text to a file, where text must first be converted to byte data before writing.
To provide convenient string manipulation capabilities, the Cangjie programming language offers StringReader and StringWriter for string processing.
To use StringReader and StringWriter, the io package must be imported:
Example of importing the io package:
import std.io.*
StringReader provides line-by-line reading and conditional reading capabilities, offering better performance and usability compared to manually converting byte data to strings.
To construct a StringReader, simply pass another input stream to it.
Example of using StringReader:
import std.io.{ByteBuffer, StringReader}
main(): Unit {
let arr1 = "012\n346789".toArray()
let byteBuffer = ByteBuffer()
byteBuffer.write(arr1)
let stringReader = StringReader(byteBuffer)
/* Reads a line of data */
let line = stringReader.readln()
println(line ?? "error") // 012
}
StringWriter provides direct string writing and line-by-line string writing capabilities, offering better performance and usability compared to manually converting strings to byte data before writing.
To construct a StringWriter, simply pass another output stream to it.
Example of using StringWriter:
import std.io.{ByteBuffer, StringWriter, readToEnd}
main(): Unit {
let byteBuffer = ByteBuffer()
let stringWriter = StringWriter(byteBuffer)
/* Writes a string */
stringWriter.write("number")
/* Writes a string and automatically appends a newline */
stringWriter.writeln(" is:")
/* Writes a number */
stringWriter.write(100.0f32)
stringWriter.flush()
println(String.fromUtf8(readToEnd(byteBuffer))) // number is:\n100.000000
}
Overview of Network Programming
Network communication is the process of data exchange between two devices through a computer network. The act of achieving network communication through software development is referred to as network programming.
Cangjie provides developers with fundamental network programming capabilities. Within the Cangjie standard library, developers can utilize the socket package under the std module to implement transport-layer network communication.
In transport-layer protocols, there are two categories: unreliable transmission and reliable transmission. Cangjie abstracts these as DatagramSocket and StreamSocket, respectively. Among unreliable transmission protocols, UDP is the most common, while TCP is the predominant reliable transmission protocol. Cangjie abstracts these as UdpSocket and TcpSocket, respectively. Additionally, Cangjie implements support for the Unix Domain protocol at the transport layer, enabling communication through both reliable and unreliable transmission methods.
At the application layer, the HTTP protocol is widely used, particularly in developing web applications. Currently, there are multiple versions of the HTTP protocol, and Cangjie currently supports HTTP/1.0, HTTP/1.1, and HTTP/2.0.
Furthermore, WebSocket, as an application-layer protocol designed to enhance communication efficiency between web servers and clients, is abstracted by Cangjie as the WebSocket object. Cangjie also supports protocol upgrades from HTTP to WebSocket.
It is important to note that network programming in Cangjie is blocking. However, it is the Cangjie thread that gets blocked, and a blocked Cangjie thread yields the system thread, thus not truly blocking a system thread.
Socket Programming
Cangjie’s socket programming refers to the implementation of network packet transmission functionality based on transport layer protocols.
In reliable transmission scenarios, Cangjie initiates both client and server sockets. The client socket must specify the remote address to connect to and may optionally bind to a local address. Only after a successful connection can it send and receive messages. The server socket, on the other hand, must bind to a local address, and only after successful binding can it send and receive messages.
In unreliable transmission scenarios, sockets do not need to distinguish between client and server roles. Cangjie initiates two sockets for data transmission. The sockets must bind to local addresses, and only after successful binding can they send and receive messages. Additionally, sockets may optionally specify a remote connection address. When specified, the socket will only accept messages from that particular remote address, and when sending (via send), there’s no need to specify the remote address as messages will automatically be sent to the successfully connected address.
TCP Programming
As a common reliable transmission protocol, using TCP-type sockets as an example, Cangjie’s reference programming model for reliable transmission scenarios is as follows:
- Create a server socket and specify the local binding address.
- Perform binding.
- Execute the accept operation, which will block until a client socket connection is obtained.
- Synchronously create a client socket and specify the remote address to connect to.
- Perform the connection.
- After successful connection, the server will return a new socket through the accept interface. At this point, the server can perform read/write operations (i.e., send/receive messages) through this socket, while the client can directly perform read/write operations.
Example of TCP server and client programs:
import std.time.*
import std.sync.*
import std.net.*
var SERVER_PORT: UInt16 = 0
func runTcpServer() {
try (serverSocket = TcpServerSocket(bindAt: SERVER_PORT)) {
serverSocket.bind()
SERVER_PORT = (serverSocket.localAddress as IPSocketAddress)?.port ?? 0
try (client = serverSocket.accept()) {
let buf = Array<Byte>(10, repeat: 0)
let count = client.read(buf)
// Data read by server: [1, 2, 3, 0, 0, 0, 0, 0, 0, 0]
println("Server read ${count} bytes: ${buf}")
}
}
}
main(): Int64 {
let future = spawn {
runTcpServer()
}
sleep(Duration.millisecond * 500)
try (socket = TcpSocket("127.0.0.1", SERVER_PORT)) {
socket.connect()
socket.write([1, 2, 3])
}
future.get()
return 0
}
Compiling and executing the above code will print:
Server read 3 bytes: [1, 2, 3, 0, 0, 0, 0, 0, 0, 0]
UDP Programming
As a common unreliable transmission protocol, using UDP-type sockets as an example, Cangjie’s reference programming model for unreliable transmission scenarios is as follows:
- Create a socket and specify the local binding address.
- Perform binding.
- Specify the remote address for message sending.
- Without connecting to a remote address, the socket can receive messages from different remote addresses and return the remote address information.
Example of UDP message sending/receiving program:
import std.time.*
import std.sync.*
import std.net.*
let SERVER_PORT: UInt16 = 8080
func runUpdServer() {
try (serverSocket = UdpSocket(bindAt: SERVER_PORT)) {
serverSocket.bind()
let buf = Array<Byte>(3, repeat: 0)
let (clientAddr, count) = serverSocket.receiveFrom(buf)
let sender = (clientAddr as IPSocketAddress)?.address.toString() ?? ""
// Server receive 3 bytes: [1, 2, 3] from 127.0.0.1
println("Server receive ${count} bytes: ${buf} from ${sender}")
}
}
main(): Int64 {
let future = spawn {
runUpdServer()
}
sleep(Duration.second)
try (udpSocket = UdpSocket(bindAt: 0)) {
udpSocket.sendTimeout = Duration.second * 2
udpSocket.bind()
udpSocket.sendTo(
IPSocketAddress("127.0.0.1", SERVER_PORT),
[1, 2, 3]
)
}
future.get()
return 0
}
Compiling and executing the above code will print:
Server receive 3 bytes: [1, 2, 3] from 127.0.0.1
HTTP Programming
HTTP, as a universal application-layer protocol, facilitates data transmission through a request-response mechanism where the client sends requests and the server returns responses. The format of requests and responses is fixed, consisting of a header and a body.
The most commonly used request types are GET and POST. A GET request contains only a header and is used to request application-layer data from the server. A POST request includes a body, separated from the header by an empty line, and is used to provide application-layer data to the server.
The header fields of request-response messages are numerous and will not be exhaustively detailed here. Cangjie supports HTTP protocol versions 1.0/1.1/2.0, among others. Developers can construct request and response messages based on RFCs 9110, 9112, 9113, 9218, 7541, and the HttpRequestBuilder and HttpResponseBuilder classes provided by Cangjie.
The following example demonstrates how to use Cangjie for client and server programming. The functionality implemented involves the client sending a request with the header GET /hello, and the server returning a response with the body "Hello Cangjie!". The code is as follows:
Note:
Libraries such as
netandloghave been moved from the Cangjie SDK to thestdxmodule. Before use, download the software package and configure it incjpm.toml.
import stdx.net.http.*
import std.time.*
import std.sync.*
import stdx.log.*
// 1. Build a Server instance
let server = ServerBuilder()
.addr("127.0.0.1")
.port(0)
.build()
func startServer(): Unit {
// 2. Register request handling logic
server.distributor.register("/hello", {httpContext =>
httpContext.responseBuilder.body("Hello Cangjie!")
})
server.logger.level = LogLevel.OFF
// 3. Start the service
server.serve()
}
func startClient(): Unit {
// 1. Build a client instance
let client = ClientBuilder().build()
// 2. Send a request
let response = client.get("http://127.0.0.1:${server.port}/hello")
// 3. Read the response body
let buffer = Array<Byte>(32, repeat: 0)
let length = response.body.read(buffer)
println(String.fromUtf8(buffer[..length]))
// 4. Close the connection
client.close()
}
main () {
spawn {
startServer()
}
sleep(Duration.second)
startClient()
}
Compiling and executing the above code will print:
Hello Cangjie!
WebSocket Programming
In network programming, WebSocket is a commonly used application-layer protocol. Like HTTP, it is also built on top of the TCP protocol and is frequently employed in web server application development.
Unlike HTTP, WebSocket only requires a single handshake between the client and server to establish a persistent connection, enabling bidirectional data transmission. This means that WebSocket-based servers can actively send data to clients, thereby achieving real-time communication.
WebSocket is an independent protocol. Its connection to HTTP lies in the fact that its handshake is interpreted by HTTP servers as an upgrade request. Therefore, Cangjie includes WebSocket within the http package.
Cangjie abstracts the WebSocket communication mechanism into the WebSocket class, providing methods to upgrade an HTTP/1.1 or HTTP/2.0 server handle to a WebSocket protocol instance. Communication is then conducted through the returned WebSocket instance, such as reading and writing data packets.
In Cangjie, the fundamental unit of data transmitted via WebSocket is called a frame. Frames are divided into two categories: one type transmits control information, including Close Frame for closing connections, Ping Frame for implementing Keep-Alive, and Pong Frame as the response type to Ping Frame. The other type transmits application data, supporting segmented transmission.
Cangjie’s frames consist of three attributes: fin and frameType together indicate whether the frame is segmented and its type, while payload represents the frame’s data payload. Developers do not need to concern themselves with other attributes for packet transmission.
The following example demonstrates the WebSocket handshake and message exchange process: creating an HTTP client and server, initiating WebSocket upgrade (or handshake) respectively, and beginning frame read/write operations after a successful handshake.
Note:
Libraries such as net, encoding, and log have been moved from the Cangjie SDK to the stdx module. Before use, download the package and configure it in cjpm.toml.
import stdx.net.http.*
import stdx.encoding.url.*
import std.time.*
import std.sync.*
import std.collection.*
import stdx.log.*
let server = ServerBuilder()
.addr("127.0.0.1")
.port(0)
.build()
// Client:
main() {
// 1 Start the server
spawn { startServer() }
sleep(Duration.millisecond * 200)
let client = ClientBuilder().build()
let u = URL.parse("ws://127.0.0.1:${server.port}/webSocket")
let subProtocol = ArrayList<String>(["foo1", "bar1"])
let headers = HttpHeaders()
headers.add("test", "echo")
// 2 Complete WebSocket handshake and obtain WebSocket instance
let websocket: WebSocket
let respHeaders: HttpHeaders
(websocket, respHeaders) = WebSocket.upgradeFromClient(client, u, subProtocols: subProtocol, headers: headers)
client.close()
println("subProtocol: ${websocket.subProtocol}") // fool1
println(respHeaders.getFirst("rsp") ?? "") // echo
// 3 Message exchange
// Send "hello"
websocket.write(TextWebFrame, "hello".toArray())
// Receive
let data = ArrayList<UInt8>()
var frame = websocket.read()
while(true) {
match(frame.frameType) {
case ContinuationWebFrame =>
data.add(all: frame.payload)
if (frame.fin) {
break
}
case TextWebFrame | BinaryWebFrame =>
if (!data.isEmpty()) {
throw Exception("invalid frame")
}
data.add(all: frame.payload)
if (frame.fin) {
break
}
case CloseWebFrame =>
websocket.write(CloseWebFrame, frame.payload)
break
case PingWebFrame =>
websocket.writePongFrame(frame.payload)
case _ => ()
}
frame = websocket.read()
}
println("data size: ${data.size}") // 4097
println("last item: ${String.fromUtf8(data.toArray()[4096])}") // a
// 4 Close WebSocket
// Exchange CloseFrame
websocket.writeCloseFrame(status: 1000)
let websocketFrame = websocket.read()
println("close frame type: ${websocketFrame.frameType}") // CloseWebFrame
println("close frame payload: ${websocketFrame.payload}") // 3, 232
// Close underlying connection
websocket.closeConn()
server.close()
}
func startServer() {
// 1 Register handler
server.distributor.register("/webSocket", handler1)
server.logger.level = LogLevel.OFF
server.serve()
}
// Server:
func handler1(ctx: HttpContext): Unit {
// 2 Complete WebSocket handshake and obtain WebSocket instance
let websocketServer = WebSocket.upgradeFromServer(ctx, subProtocols: ArrayList<String>(["foo", "bar", "foo1"]),
userFunc: {request: HttpRequest =>
let value = request.headers.getFirst("test") ?? ""
let headers = HttpHeaders()
headers.add("rsp", value)
headers
})
// 3 Message exchange
// Receive "hello"
let data = ArrayList<UInt8>()
var frame = websocketServer.read()
while(true) {
match(frame.frameType) {
case ContinuationWebFrame =>
data.add(all: frame.payload)
if (frame.fin) {
break
}
case TextWebFrame | BinaryWebFrame =>
if (!data.isEmpty()) {
throw Exception("invalid frame")
}
data.add(all: frame.payload)
if (frame.fin) {
break
}
case CloseWebFrame =>
websocketServer.write(CloseWebFrame, frame.payload)
break
case PingWebFrame =>
websocketServer.writePongFrame(frame.payload)
case _ => ()
}
frame = websocketServer.read()
}
println("data: ${String.fromUtf8(data.toArray())}") // hello
// Send 4097 'a's
websocketServer.write(TextWebFrame, Array<UInt8>(4097, repeat: 97))
// 4 Close WebSocket
// Exchange CloseFrame
let websocketFrame = websocketServer.read()
println("close frame type: ${websocketFrame.frameType}") // CloseWebFrame
println("close frame payload: ${websocketFrame.payload}") // 3, 232
websocketServer.write(CloseWebFrame, websocketFrame.payload)
// Close underlying connection
websocketServer.closeConn()
}
The example produces the following output:
subProtocol: foo1
echo
data: hello
data size: 4097
last item: a
close frame type: CloseWebFrame
close frame payload: [3, 232]
close frame type: CloseWebFrame
close frame payload: [3, 232]
Introduction to Macros
Macros can be understood as a special type of function. While regular functions perform computations on input values and output a new value, macros take and return the program itself. They accept a piece of code as input and output a new piece of code, which is then used for compilation and execution. To distinguish macro calls from function calls, macros are invoked using @ followed by the macro name.
The following example demonstrates printing both the value and the expression itself during debugging:
let x = 3
let y = 2
@dprint(x) // Prints "x = 3"
@dprint(x + y) // Prints "x + y = 5"
Clearly, dprint cannot be implemented as a regular function because functions only receive the value of the input expression, not the expression itself. However, dprint can be implemented as a macro to access the code fragment of the input expression. A basic implementation is shown below:
macro package define
import std.ast.*
public macro dprint(input: Tokens): Tokens {
let inputStr = input.toString()
let result = quote(
print($(inputStr) + " = ")
println($(input)))
return result
}
Before explaining each line, let’s verify this macro achieves the desired effect. First, create a define directory in the current folder, then create a dprint.cj file inside it with the above content. Additionally, create a main.cj file in the current directory with the following test code:
import define.*
main() {
let x = 3
let y = 2
@dprint(x)
@dprint(x + y)
}
Note the resulting directory structure:
// Directory layout.
src
|-- define
| `-- dprint.cj
`-- main.cj
In the current directory (src), run the compilation commands:
cjc define/*.cj --compile-macro
cjc main.cj -o main
Then execute ./main to see the following output:
x = 3
x + y = 5
Now let’s examine each part of the code:
-
Line 1:
macro package defineMacros must be declared in separate packages (they cannot coexist with other public functions). Packages containing macros are declared using
macro package. Here we declare a macro package nameddefine. -
Line 2:
import std.ast.*The data types required for macro implementation, such as
Tokensand syntax node types (to be discussed later), are located in theastpackage of the Cangjie standard library. Therefore, any macro implementation must first import theastpackage. -
Line 3:
public macro dprint(input: Tokens): TokensThis declares a macro named
dprint. Since this is a non-attribute macro (this concept will be explained later), it accepts a parameter of typeTokens, representing the code fragment passed to the macro. The macro’s return value is also a code fragment. -
Line 4:
let inputStr = input.toString()In the macro implementation, we first convert the input code fragment to a string. In our test cases,
inputStrbecomes"x"or"x + y". -
Lines 5-7:
let result = quote(...)The
quoteexpression is used to constructTokens. It converts the code fragment within parentheses intoTokens. Insidequote, interpolation$(...)can be used to convert the enclosed expression intoTokensand insert it into theTokensconstructed byquote. In this code,$(inputStr)inserts the value ofinputStr(including quotation marks), while$(input)inserts the input code fragment. Thus, if the input expression isx + y, the resultingTokenswould be:print("x + y" + " = ") println(x + y) -
Line 8:
return resultFinally, the constructed code fragment is returned. These two lines of code will be compiled, and when executed, will output
x + y = 5.
Reviewing the dprint macro definition: it uses Tokens as input and employs quote with interpolation to construct another Tokens as output. To work with macros effectively, we need detailed understanding of Tokens, quote, and interpolation concepts, which will be introduced separately below.
Token-Related Types and Quote Expressions
Token Type
The fundamental type for macro operations is Tokens, representing a code fragment. Tokens consist of multiple Token elements, where each Token can be understood as a user-operable lexical unit. A Token may be an identifier (e.g., variable names), literal (e.g., integers, floats, strings), keyword, or operator. Each Token contains its type, content, and positional information.
The type of a Token is an enum value from TokenKind. For available values of TokenKind, refer to the Cangjie Programming Language Library API documentation. By providing TokenKind and Token values (the identifier or literal corresponding to TokenKind), any Token can be directly constructed. The specific constructors are as follows:
Token(k: TokenKind)
Token(k: TokenKind, v: String)
Below are some examples of Token construction:
import std.ast.*
let tk1 = Token(TokenKind.ADD) // '+' operator
let tk2 = Token(TokenKind.FUNC) // func keyword
let tk3 = Token(TokenKind.IDENTIFIER, "x") // x identifier
let tk4 = Token(TokenKind.INTEGER_LITERAL, "3") // integer literal
let tk5 = Token(TokenKind.STRING_LITERAL, "xyz") // string literal
Tokens Type
A Tokens represents a sequence composed of multiple Token elements. Tokens can be constructed directly from an array of Token. Below are three basic ways to construct Tokens instances:
Tokens() // construct an empty list
Tokens(tks: Array<Token>)
Tokens(tks: ArrayList<Token>)
Additionally, the Tokens type supports the following functionalities:
size: Returns the number ofTokenelements contained inTokensget(index: Int64): Retrieves theTokenelement at the specified index[]: Retrieves theTokenelement at the specified index+: Concatenates twoTokensor directly concatenatesTokenswith aTokendump(): Prints all containedTokenelements for debugging purposestoString(): Prints the code fragment corresponding toTokens
In the following example, constructors are used to directly create Token and Tokens, followed by printing detailed debugging information:
import std.ast.*
let tks = Tokens([
Token(TokenKind.INTEGER_LITERAL, "1"),
Token(TokenKind.ADD),
Token(TokenKind.INTEGER_LITERAL, "2")
])
main() {
println(tks)
tks.dump()
}
The expected output is as follows (specific positional information may vary):
1 + 2
description: integer_literal, token_id: 140, token_literal_value: 1, fileID: 1, line: 4, column: 5
description: add, token_id: 12, token_literal_value: +, fileID: 1, line: 5, column: 5
description: integer_literal, token_id: 140, token_literal_value: 2, fileID: 1, line: 6, column: 5
The dump information includes each Token’s type (description) and value (token_literal_value), followed by the positional information of each Token.
Quote Expression and Interpolation
In most cases, directly constructing and concatenating Tokens can be cumbersome. Therefore, the Cangjie language provides the quote expression to construct Tokens from code templates. The term “code template” is used because quote allows the use of $(...) to interpolate expressions from the context. The interpolated expressions must be convertible to Tokens (specifically, they must implement the ToTokens interface). In the standard library, the following types implement the ToTokens interface:
- All node types (nodes will be discussed in Syntax Nodes)
TokenandTokenstypes- All primitive data types: integers, floats,
Bool,Rune, andString Array<T>andArrayList<T>, whereThas type restrictions and outputs different delimiters based onT’s type. For details, refer to the Cangjie Programming Language Library API documentation.
The following example demonstrates interpolation with Array and primitive data types:
import std.ast.*
let intList: Array<Int64> = [1, 2, 3, 4, 5]
let float: Float64 = 1.0
let str: String = "Hello"
let tokens = quote(
arr = $(intList)
x = $(float)
s = $(str)
)
main() {
println(tokens)
}
The output is:
arr =[1, 2, 3, 4, 5]
x = 1.0
s = "Hello"
For more interpolation usage, refer to Using Quote to Interpolate Syntax Nodes.
Specifically, when a quote expression contains certain special Token elements, escaping is required:
- Unmatched parentheses are not allowed in
quoteexpressions, but parentheses escaped with\are exempt from the matching rules. - When
$represents a regularTokenrather than code interpolation, it must be escaped with\. - Except for the above cases, the presence of
\inquoteexpressions will result in a compilation error.
Note:
The
#symbol can only be used to construct multiline raw string literals and cannot be used standalone.
Below are some examples of quote expressions containing these special Token elements:
import std.ast.*
let tks1 = quote((x)) // ok
let tks2 = quote(\() // ok
let tks3 = quote( ( \) ) ) // ok
let tks4 = quote()) // error: unmatched delimiter: ')'
let tks5 = quote( ( \) ) // error: unclosed delimiter: '('
let tks6 = quote(\$(1)) // ok
let tks7 = quote(\x) // error: unknown start of token: \
let tks8 = quote(#) // error: expected '#' or '"' in raw string
Syntax Nodes
In the compilation process of the Cangjie language, the code is first converted into Tokens through lexical analysis, followed by syntactic parsing of the Tokens to generate a syntax tree. Each node in the syntax tree may represent an expression, declaration, type, pattern, etc. The Cangjie standard library std.ast package provides corresponding classes for each type of node, with appropriate inheritance relationships. The main abstract classes are as follows:
Node: The parent class of all syntax nodesTypeNode: The parent class of all type nodesExpr: The parent class of all expression nodesDecl: The parent class of all declaration nodesPattern: The parent class of all pattern nodes
There are numerous specific node types. For detailed information, please refer to the Cangjie Programming Language Library API. The following examples primarily use the following nodes:
BinaryExpr: Binary operation expressionsFuncDecl: Function declarations
Parsing Nodes
Using the std.ast standard library package, virtually any node can be parsed from Tokens. There are two methods for parsing Tokens and constructing syntax nodes.
Parsing Tokens Using Parsing Functions
The following functions are used to parse and construct arbitrary syntax nodes from Tokens:
parseExpr(input: Tokens): Expr: Parses the inputTokensinto an expression node.parseExprFragment(input: Tokens, startFrom!: Int64 = 0): (Expr, Int64): Parses a fragment of the inputTokensinto an expression node, starting from thestartFromindex. The parsing may consume only part of the fragment starting fromstartFromand returns the index of the first unconsumedToken(if the entire fragment is consumed, the return value isinput.size).parseDecl(input: Tokens, astKind!: String = ""): Parses the inputTokensinto a declaration node.astKindprovides additional settings; refer to the Cangjie Programming Language Library API for details.parseDeclFragment(input: Tokens, startFrom!: Int64 = 0): (Decl, Int64): Parses a fragment of the inputTokensinto a declaration node. ThestartFromparameter and the meaning of the returned index are the same as inparseExpr.parseType(input: Tokens): TypeNode: Parses the inputTokensinto a type node.parseTypeFragment(input: Tokens, startFrom!: Int64 = 0): (TypeNode, Int64): Parses a fragment of the inputTokensinto a type node. ThestartFromparameter and the meaning of the returned index are the same as inparseExpr.parsePattern(input: Tokens): Pattern: Parses the inputTokensinto a pattern node.parsePatternFragment(input: Tokens, startFrom!: Int64 = 0): (Pattern, Int64): Parses a fragment of the inputTokensinto a pattern node. ThestartFromparameter and the meaning of the returned index are the same as inparseExpr.
If parsing fails, an exception is thrown. This parsing method is suitable for code fragments of unknown types. If a specific subtype node is required, the parsing result must be manually cast to the corresponding subtype.
Usage examples of these functions are shown below:
let tks1 = quote(a + b)
let tks2 = quote(u + v, x + y)
let tks3 = quote(
func f1(x: Int64) { return x + 1 }
)
let tks4 = quote(
func f2(x: Int64) { return x + 2 }
func f3(x: Int64) { return x + 3 }
)
let binExpr1 = parseExpr(tks1)
let (binExpr2, mid) = parseExprFragment(tks2)
let (binExpr3, _) = parseExprFragment(tks2, startFrom: mid + 1) // Skip the comma
println("binExpr1 = ${binExpr1.toTokens()}")
println("binExpr2 = ${binExpr2.toTokens()}, binExpr3 = ${binExpr3.toTokens()}")
let funcDecl1 = parseDecl(tks3)
let (funcDecl2, mid2) = parseDeclFragment(tks4)
let (funcDecl3, _) = parseDeclFragment(tks4, startFrom: mid2)
println("${funcDecl1.toTokens()}")
println("${funcDecl2.toTokens()}")
println("${funcDecl3.toTokens()}")
Output:
binExpr1 = a + b
binExpr2 = u + v, binExpr3 = x + y
func f1(x: Int64) {
return x + 1
}
func f2(x: Int64) {
return x + 2
}
func f3(x: Int64) {
return x + 3
}
Parsing Tokens Using Syntax Node Constructors
Most syntax nodes support the init(input: Tokens) constructor, which parses the input Tokens into a node of the corresponding type. For example:
import std.ast.*
let binExpr = BinaryExpr(quote(a + b))
let funcDecl = FuncDecl(quote(func f1(x: Int64) { return x + 1 }))
If parsing fails, an exception is thrown. This parsing method is suitable for code fragments of known types, eliminating the need for manual casting to specific subtypes after parsing.
Components of Nodes
After parsing nodes from Tokens, you can examine their components. For illustration, only the components of BinaryExpr and FuncDecl are listed here. For more detailed explanations of other nodes, refer to the Cangjie Programming Language Library API.
BinaryExprnode:leftExpr: Expr: The expression on the left side of the operatorop: Token: The operatorrightExpr: Expr: The expression on the right side of the operator
FuncDeclnode (partial):identifier: Token: The function namefuncParams: ArrayList<FuncParam>: The parameter listdeclType: TypeNode: The return typeblock: Block: The function body
FuncParamnode (partial):identifier: Token: The parameter nameparamType: TypeNode: The parameter type
Blocknode (partial):nodes: ArrayList<Node>: Expressions and declarations within the block
Each component is a public mut prop and can be inspected and updated. The results of such updates are demonstrated in the following examples.
BinaryExpr Example
let binExpr = BinaryExpr(quote(x * y))
binExpr.leftExpr = BinaryExpr(quote(a + b))
println(binExpr.toTokens())
binExpr.op = Token(TokenKind.ADD)
println(binExpr.toTokens())
Output:
(a + b) * y
a + b + y
First, parsing yields binExpr as the node x * y, represented as:
*
/ \
x y
Next, the left node (x) is replaced with a + b, resulting in the following syntax tree:
*
/ \
+ y
/ \
a b
When outputting this syntax tree, parentheses must be added around a + b to yield (a + b) * y (outputting a + b * y would imply multiplication before addition, which contradicts the syntax tree’s meaning). The ast library automatically adds parentheses when outputting syntax trees.
Finally, the operator at the root of the syntax tree is changed from * to +, resulting in:
+
/ \
+ y
/ \
a b
This syntax tree can be output as a + b + y since addition is left-associative and does not require parentheses on the left side.
FuncDecl Example
let funcDecl = FuncDecl(quote(func f1(x: Int64) { x + 1 }))
funcDecl.identifier = Token(TokenKind.IDENTIFIER, "foo")
println("Number of parameters: ${funcDecl.funcParams.size}")
funcDecl.funcParams[0].identifier = Token(TokenKind.IDENTIFIER, "a")
println("Number of nodes in body: ${funcDecl.block.nodes.size}")
let binExpr = (funcDecl.block.nodes[0] as BinaryExpr).getOrThrow()
binExpr.leftExpr = parseExpr(quote(a))
println(funcDecl.toTokens())
In this example, a FuncDecl node is first constructed through parsing. The function name, parameter name, and part of the expression in the function body are then modified. Output:
Number of parameters: 1
Number of nodes in body: 1
func foo(a: Int64) {
a + 1
}
Interpolating Syntax Nodes Using quote
Any syntax node can be interpolated within a quote statement. Some ArrayList lists of syntax nodes can also be interpolated (primarily corresponding to scenarios where such node lists are encountered in practice). Interpolation is achieved via $(node), where node is an instance of any node type.
The following examples demonstrate node interpolation.
var binExpr = BinaryExpr(quote(1 + 2))
let a = quote($(binExpr))
let b = quote($binExpr)
let c = quote($(binExpr.leftExpr))
let d = quote($binExpr.leftExpr)
println("a: ${a.toTokens()}")
println("b: ${b.toTokens()}")
println("c: ${c.toTokens()}")
println("d: ${d.toTokens()}")
Output:
a: 1 + 2
b: 1 + 2
c: 1
d: 1 + 2.leftExpr
Generally, the expression following the interpolation operator is enclosed in parentheses to define its scope, e.g., $(binExpr). However, when followed by a single identifier, parentheses can be omitted, as in $binExpr. Thus, in the example, both a and b interpolate the binExpr node within quote, resulting in 1 + 2. However, if the expression following the interpolation operator is more complex, omitting parentheses may lead to scope errors. For instance, the expression binExpr.leftExpr evaluates to the left expression of 1 + 2, i.e., 1, so c is correctly assigned 1. However, in d, the interpolation is interpreted as ($binExpr).leftExpr, resulting in 1 + 2.leftExpr. To clarify the scope of interpolation, it is recommended to use parentheses with the interpolation operator.
The following example demonstrates interpolation of node lists (ArrayList).
var incrs = ArrayList<Node>()
for (i in 1..=5) {
incrs.add(parseExpr(quote(x += $(i))))
}
var foo = quote(
func foo(n: Int64) {
let x = n
$(incrs)
x
})
println(foo)
Output:
func foo(n: Int64) {
let x = n
x += 1
x += 2
x += 3
x += 4
x += 5
x
}
In this example, a node list incrs is created, containing expressions x += 1, …, x += 5. Interpolating incrs lists the nodes sequentially, with line breaks after each node. This is useful for inserting expressions and declarations that need to be executed sequentially.
The following example demonstrates cases where parentheses are necessary around interpolations to ensure correctness.
var binExpr1 = BinaryExpr(quote(x + y))
var binExpr2 = BinaryExpr(quote($(binExpr1) * z)) // Incorrect: yields x + y * z
println("binExpr2: ${binExpr2.toTokens()}")
println("binExpr2.leftExpr: ${binExpr2.leftExpr.toTokens()}")
println("binExpr2.rightExpr: ${binExpr2.rightExpr.toTokens()}")
var binExpr3 = BinaryExpr(quote(($(binExpr1)) * z)) // Correct: yields (x + y) * z
println("binExpr3: ${binExpr3.toTokens()}")
Output:
binExpr2: x + y * z
binExpr2.leftExpr: x
binExpr2.rightExpr: y * z
binExpr3: (x + y) * z
First, the expression x + y is constructed and then interpolated into the template $(binExpr1) * z. The intention is to obtain an expression that first computes x + y and then multiplies by z. However, the interpolation yields x + y * z, which computes y * z first and then adds x. This occurs because interpolation does not automatically add parentheses to ensure the atomicity of the interpolated expression (unlike the replacement of leftExpr described earlier). Thus, parentheses must be added around $(binExpr1) to ensure the correct result.
Macro Implementation
This chapter introduces the definition and usage of Cangjie macros, which can be categorized into Non-Attribute Macros and Attribute Macros. Additionally, this chapter will cover the behavior when macros are nested.
Non-Attribute Macros
Non-attribute macros only accept the code to be transformed and do not take other parameters (attributes). Their definition format is as follows:
import std.ast.*
public macro MacroName(args: Tokens): Tokens {
... // Macro body
}
The macro invocation format is as follows:
@MacroName(...)
Macro invocations are enclosed in (). The content inside the parentheses can be any valid Tokens or empty.
When a macro is applied to a declaration, the parentheses can generally be omitted. Refer to the following examples:
@MacroName func name() {} // Before a FuncDecl
@MacroName struct name {} // Before a StructDecl
@MacroName class name {} // Before a ClassDecl
@MacroName var a = 1 // Before a VarDecl
@MacroName enum e {} // Before a Enum
@MacroName interface i {} // Before a InterfaceDecl
@MacroName extend e <: i {} // Before a ExtendDecl
class C {
@MacroName prop i: Int64 { // Before a PropDecl
get() { 0 }
}
}
@MacroName @AnotherMacro(input) // Before a macro call
Special notes on the legality of Tokens within parentheses:
-
The input must consist of a sequence of valid
Tokens. Symbols like “#”, “`”, “\”, etc., when used alone, are not valid CangjieTokens and are not supported as input values. -
If the input contains unmatched parentheses, they must be escaped using the escape symbol “\”.
-
If the input contains “@” as a
Token, it must be escaped using the escape symbol “\”.
For special cases of input, refer to the following examples:
// Illegal input Tokens
@MacroName(#) // Not a whole Token
@MacroName(`) // Not a whole Token
@MacroName(() // ( and ) not match
@MacroName(\[) // Escape for unsupported symbol
// Legal input Tokens
@MacroName(#"abc"#)
@MacroName(`class`)
@MacroName([)
@MacroName([])
@MacroName(\()
@MacroName(\@)
The macro expansion process operates on the Cangjie syntax tree. After expansion, the compiler continues with subsequent compilation steps. Therefore, the following rules must be observed:
- The expanded code must still be valid Cangjie code, and the expanded code must not contain package declarations or import statements, as this may cause compilation issues.
- When a macro is used for a declaration, if parentheses are omitted, the input must be syntactically valid for the declaration.
Below are several typical examples of macro applications.
-
Example 1
Macro definition file
macro_definition.cjmacro package macro_definition import std.ast.* public macro testDef(input: Tokens): Tokens { println("I'm in macro body") return input }Macro invocation file
macro_call.cjpackage macro_calling import macro_definition.* main(): Int64 { println("I'm in function body") let a: Int64 = @testDef(1 + 2) println("a = ${a}") return 0 }The compilation process for the above code can be referred to in Macro Compilation and Usage.
Print statements have been added in the example, where
I'm in macro bodyin the macro definition will be output during the compilation ofmacro_call.cj. Simultaneously, the macro invocation point is expanded. For example, compiling the following code:let a: Int64 = @testDef(1 + 2)The compiler updates the
Tokensreturned by the macro to the syntax tree at the invocation point, resulting in the following code:let a: Int64 = 1 + 2In other words, the actual code in the executable becomes:
main(): Int64 { println("I'm in function body") let a: Int64 = 1 + 2 println("a = ${a}") return 0 }The value of
ais computed as 3, and when printing the value ofa, it is interpolated as 3. Thus, the output of the above program is:I'm in function body a = 3
Now, let’s look at a more meaningful example of using a macro to process a function. The ModifyFunc macro adds an id parameter to myFunc and inserts code before and after counter++.
-
Example 2
Macro definition file
macro_definition.cj// file macro_definition.cj macro package macro_definition import std.ast.* public macro ModifyFunc(input: Tokens): Tokens { println("I'm in macro body") let funcDecl = FuncDecl(input) return quote( func $(funcDecl.identifier)(id: Int64) { println("start ${id}") $(funcDecl.block.nodes) println("end") }) }Macro invocation file
macro_call.cjpackage macro_calling import macro_definition.* var counter = 0 @ModifyFunc func myFunc() { counter++ } func exModifyFunc() { println("I'm in function body") myFunc(123) println("myFunc called: ${counter} times") return 0 } main(): Int64 { exModifyFunc() }Similarly, the above two code segments are located in different files. First, compile the macro definition file
macro_definition.cj, then compile the macro invocation filemacro_call.cjto generate the executable.In this example, the ModifyFunc macro takes a function declaration as input, so the parentheses can be omitted:
@ModifyFunc func myFunc() { counter++ }After macro expansion, the following code is obtained:
func myFunc(id: Int64) { println("start ${id}") counter++ println("end") }myFuncis called inmain, and the actual parameter it receives is also defined inmain, forming a valid Cangjie program. The runtime output is as follows:I'm in function body start 123 end myFunc called: 1 times
Attribute Macros
Compared to non-attribute macros, attribute macros include an additional Tokens type input parameter. This additional parameter allows developers to input extra information. For example, developers might want to use different macro expansion strategies in different invocation scenarios, which can be indicated via this attribute parameter. Additionally, this attribute parameter can accept any Tokens, which can be combined or concatenated with the code modified by the macro. Below is a simple example:
macro package define
// Macro definition with attribute
public macro Foo(attrTokens: Tokens, inputTokens: Tokens): Tokens {
return attrTokens + inputTokens // Concatenate attrTokens and inputTokens.
}
As shown in the macro definition above, an attribute macro has two parameters of type Tokens. Within the macro definition, attrTokens and inputTokens can undergo various transformations such as combination or concatenation, and the new Tokens is returned.
The invocation of an attribute macro is similar to that of a non-attribute macro. The additional parameter attrTokens is passed via [], and the invocation form is as follows:
import define.Foo
// attribute macro with parentheses
var a: Int64 = @Foo[1+](2+3)
// attribute macro without parentheses
@Foo[public]
struct Data {
var count: Int64 = 100
}
main() {}
-
For the macro Foo invocation, when the parameter is
2+3, it is concatenated with the attribute1+inside[]. After macro expansion, the result isvar a: Int64 = 1+2+3. -
For the macro Foo invocation, when the parameter is
struct Data, it is concatenated with the attributepublicinside[]. After macro expansion, the result is:public struct Data { var count: Int64 = 100 }
Regarding attribute macros, the following points should be noted:
-
Attribute macros, compared to non-attribute macros, can modify the same AST nodes. Essentially, attribute macros enhance the parameters that can be passed.
-
The rules for the legality of parameters inside parentheses for attribute macros are consistent with those for non-attribute macros.
-
Special notes on the legality of attribute parameters inside square brackets:
-
The input must consist of a sequence of valid
Tokens. Symbols like “#”, “", "\\", etc., when used alone, are not valid CangjieToken`s and are not supported as input values. -
If the input contains unmatched square brackets, they must be escaped using the escape symbol “\”.
-
If the input contains “@” as a
Token, it must be escaped using the escape symbol “\”.
// Illegal attribute Tokens @MacroName[#]() // Not a whole Token @MacroName[`]() // Not a whole Token @MacroName[@]() // Not escape for @ @MacroName[[]() // [ and ] not match @MacroName[\(]() // Escape for unsupported symbol // Legal attribute Tokens @MacroName[#"abc"#]() @MacroName[`class`]() @MacroName[(]() @MacroName[()]() @MacroName[\[]() @MacroName[\@]() -
-
The macro definition and invocation types must be consistent: if the macro definition has two parameters (i.e., an attribute macro definition), the invocation must include
[], and the content can be empty; if the macro definition has one parameter (i.e., a non-attribute macro definition), the invocation must not use[].
Nested Macros
The Cangjie language does not support nested macro definitions but conditionally supports nested macro invocations within macro definitions and macro invocations.
Nested Macro Invocations in Macro DefinitionsHere is the professional translation of the provided Markdown content from Chinese to English, maintaining all structural and formatting elements
Below is an example of macro definitions containing nested macro calls.
Macro Definitions with Nested Calls
The getIdent macro is defined in macro package pkg1:
macro package pkg1
import std.ast.*
public macro getIdent(attr:Tokens, input:Tokens):Tokens {
return quote(
let decl = (parseDecl(input) as VarDecl).getOrThrow()
let name = decl.identifier.value
let size = name.size - 1
let $(attr) = Token(TokenKind.IDENTIFIER, name[0..size])
)
}
The Prop macro in package pkg2 contains a nested call to getIdent:
macro package pkg2
import std.ast.*
import pkg1.*
public macro Prop(input:Tokens):Tokens {
let v = parseDecl(input)
@getIdent[ident](input)
return quote(
$(input)
public prop $(ident): $(decl.declType) {
get() {
this.$(v.identifier)
}
}
)
}
Macro usage in package pkg3 calling the Prop macro:
package pkg3
import pkg2.*
class A {
@Prop
private let a_: Int64 = 1
}
main() {
let b = A()
println("${b.a}")
}
Note: Due to the constraint that macro definitions must be compiled before their call sites, the compilation order must be: pkg1 → pkg2 → pkg3. The Prop macro definition in pkg2:
public macro Prop(input:Tokens):Tokens {
let v = parseDecl(input)
@getIdent[ident](input)
return quote(
$(input)
public prop $(ident): $(decl.declType) {
get() {
this.$(v.identifier)
}
}
)
}
Will first be expanded into the following code before compilation:
public macro Prop(input: Tokens): Tokens {
let v = parseDecl(input)
let decl = (parseDecl(input) as VarDecl).getOrThrow()
let name = decl.identifier.value
let size = name.size - 1
let ident = Token(TokenKind.IDENTIFIER, name[0 .. size])
return quote(
$(input)
public prop $(ident): $(decl.declType) {
get() {
this.$(v.identifier)
}
}
)
}
Nested Macro Calls
A common scenario for nested macros occurs when macro-decorated code blocks contain other macro calls. A concrete example:
Macros Foo and Bar defined in package pkg1:
macro package pkg1
import std.ast.*
public macro Foo(input: Tokens): Tokens {
return input
}
public macro Bar(input: Tokens): Tokens {
return input
}
The addToMul macro defined in package pkg2:
macro package pkg2
import std.ast.*
public macro addToMul(inputTokens: Tokens): Tokens {
var expr: BinaryExpr = match (parseExpr(inputTokens) as BinaryExpr) {
case Some(v) => v
case None => throw Exception()
}
var op0: Expr = expr.leftExpr
var op1: Expr = expr.rightExpr
return quote(($(op0)) * ($(op1)))
}
Usage of these three macros in package pkg3:
package pkg3
import pkg1.*
import pkg2.*
@Foo
struct Data {
let a = 2
let b = @addToMul(2+3)
@Bar
public func getA() {
return a
}
public func getB() {
return b
}
}
main(): Int64 {
let data = Data()
var a = data.getA() // a = 2
var b = data.getB() // b = 6
println("a: ${a}, b: ${b}")
return 0
}
As shown above, macro Foo decorates struct Data, while macro calls addToMul and Bar appear inside the struct. The transformation rule for such nested scenarios is: expand the innermost macros (addToMul and Bar) first, then expand the outer macro (Foo). Multi-level nesting is allowed, with expansion always proceeding from innermost to outermost.
Nested macros can appear in both parenthesized and unparenthesized macro calls. These can be combined, but developers must ensure unambiguous expansion order:
var a = @foo(@foo1(2 * 3)+@foo2(1 + 3)) // foo1, foo2 have to be defined.
@Foo1 // Foo2 expands first, then Foo1 expands.
@Foo2[attr: struct] // Attribute macro can be used in nested macro.
struct Data{
@Foo3 @Foo4[123] var a = @bar1(@bar2(2 + 3) + 3) // bar2, bar1, Foo4, Foo3 expands in order.
public func getA() {
return @foo(a + 2)
}
}
Message Passing Between Nested Macros
This refers to nested macro calls.
Inner macros can use the library function assertParentContext to ensure they are only called within specific outer macro contexts. If this condition isn’t met, the function throws an error. The InsideParentContext function similarly checks nesting relationships, returning a boolean. Example:
Macro definitions:
public macro Outer(input: Tokens): Tokens {
return input
}
public macro Inner(input: Tokens): Tokens {
assertParentContext("Outer")
return input
}
Macro calls:
@Outer var a = 0
@Inner var b = 0 // Error: The macro call 'Inner' should be nested within an 'Outer' context.
Here, Inner uses assertParentContext to verify it’s called within an Outer macro. Since this nesting doesn’t exist in the example, the compiler reports an error.
Inner macros can also communicate with outer macros via key/value pairs. During execution:
- Inner macros send messages via
setItem - Outer macros receive these messages via
getChildMessages(a collection of key/value mappings)
Example macro definitions:
macro package define
import std.ast.*
public macro Outer(input: Tokens): Tokens {
let messages = getChildMessages("Inner")
let getTotalFunc = quote(public func getCnt() {
)
for (m in messages) {
let identName = m.getString("identifierName")
// let value = m.getString("key") // Receive multiple messages
getTotalFunc.append(Token(TokenKind.IDENTIFIER, identName))
getTotalFunc.append(quote(+))
}
getTotalFunc.append(quote(0))
getTotalFunc.append(quote(}))
let funcDecl = parseDecl(getTotalFunc)
let decl = (parseDecl(input) as ClassDecl).getOrThrow()
decl.body.decls.add(funcDecl)
return decl.toTokens()
}
public macro Inner(input: Tokens): Tokens {
assertParentContext("Outer")
let decl = parseDecl(input)
setItem("identifierName", decl.identifier.value)
// setItem("key", "value") // Multiple messages via different keys
return input
}
Macro calls:
import define.*
@Outer
class Demo {
@Inner var state = 1
@Inner var cnt = 42
}
main(): Int64 {
let d = Demo()
println("${d.getCnt()}")
return 0
}
In this code, Outer receives variable names from two Inner macros and automatically adds to the class:
public func getCnt() {
state + cnt + 0
}
Workflow:
- Inner macros send messages via
setItem - Outer macro receives messages via
getChildMessages(multipleInnercalls possible) - Values are retrieved via the message object’s
getStringmethod
The translation strictly maintains all Markdown formatting, code blocks, and structural elements while providing accurate technical terminology and natural English flow.
Compilation, Errors, and Debugging
Macro Compilation and Usage
The current compiler enforces that macro definitions and macro calls cannot reside in the same package. The macro package must be compiled first, followed by the package that calls the macros. Within the macro-calling package, macro definitions are not permitted. Since macros need to be exported from one package to another, the compiler requires macro definitions to be declared with the public modifier.
Below is a simple example.
Source directory structure:
// Directory layout.
root_path
├── macros
│ └── m.cj
├── src
│ └── demo.cj
└─ target
Macro definitions are placed in the macros subdirectory:
// macros/m.cj
// In this file, we define the macro Inner, Outer.
macro package define
import std.ast.*
public macro Inner(input: Tokens) {
return input
}
public macro Outer(input: Tokens) {
return input
}
Macro calls are placed in the src subdirectory:
// src/demo.cj
import define.*
@Outer
class Demo {
@Inner var state = 1
@Inner var cnt = 42
}
main() {
println("test macro")
}
When the compiled output of the macro definition file and the file using the macros are not in the same directory, the --import-path compilation option must be added to specify the path to the compiled output of the macro definition file. Below are the compilation commands for Linux (specific compilation options may evolve with cjc updates; refer to the latest cjc documentation for current options):
# First compile the macro definition file to generate the default dynamic library in the specified directory (the path can be specified, but not the library name)
cjc macros/m.cj --compile-macro --output-dir ./target
# Compile the file using macros, perform macro substitution, and generate the executable
cjc src/demo.cj -o demo --import-path ./target --output-dir ./target
# Run the executable
./target/demo
On Linux, this will generate macro_define.cjo for package management and the actual dynamic library file.
For Windows:
# Current directory: src
# First compile the macro definition file to generate the default dynamic library in the specified directory (the path can be specified, but not the library name)
cjc macros/m.cj --compile-macro --output-dir ./target
# Compile the file using macros, perform macro substitution, and generate the executable
cjc src/demo.cj -o demo.exe --import-path ./target --output-dir ./target
If the macro package depends on other dynamic libraries, ensure these dependencies are accessible during runtime (macro expansion relies on executing methods within the macro package). On Linux, set the LD_LIBRARY_PATH environment variable (on Windows, set PATH) to include the paths of the dependent libraries.
Note:
The macro substitution process depends on the Cangjie runtime. During macro substitution, the Cangjie runtime’s initialization configuration uses the default settings provided by macros. Configuration parameters can be queried using Cangjie runtime operational logs. Among these,
cjHeapSizeandcjStackSizecan be modified by users, while others currently cannot. Note that all parameters are invalid on the OpenHarmony platform. The Cangjie runtime uses default values on the OpenHarmony platform. For Cangjie runtime initialization configurations, refer to the Runtime Initialization Optional Configurations section.
Parallel Macro Expansion
The --parallel-macro-expansion option can be added when compiling macro-calling files to enable parallel macro expansion. The compiler automatically analyzes dependencies between macro calls, allowing independent macro calls to execute in parallel. For example, the two @Inner calls in the above example can be expanded in parallel, reducing overall compilation time.
Caution:
If macro functions rely on global variables, using parallel macro expansion may introduce risks.
macro package define
import std.ast.*
import std.collection.HashMap
var Counts = HashMap<String, Int64>()
public macro Inner(input: Tokens) {
for (t in input) {
if (t.value.size == 0) {
continue
}
// Count occurrences of all valid token values
if (!Counts.contains(t.value)) {
Counts[t.value] = 0
}
Counts[t.value] = Counts[t.value] + 1
}
return input
}
public macro B(input: Tokens) {
return input
}
In the above code, if @Inner macro calls appear in multiple places and parallel macro expansion is enabled, accessing the global variable Counts may lead to conflicts, resulting in incorrect final counts.
It is recommended to avoid using global variables in macro functions. If unavoidable, either disable parallel macro expansion or protect global variables with Cangjie thread locks.
diagReport Error Mechanism
The Cangjie standard library std.ast package provides the diagReport interface for custom error reporting. This allows macro definers to issue custom warnings or errors when parsing input Tokens with invalid content.
The custom error interface mimics the native compiler’s error output format, supporting both warning and error-level messages.
The diagReport function prototype is as follows:
public func diagReport(level: DiagReportLevel, tokens: Tokens, message: String, hint: String): Unit
Parameter meanings:
level: Error message severity leveltokens: Tokens corresponding to the source code referenced in the error messagemessage: Primary error messagehint: Supplementary hint message
Refer to the following usage example.
Macro definition file:
// macro_definition.cj
macro package macro_definition
import std.ast.*
public macro testDef(input: Tokens): Tokens {
for (i in 0..input.size) {
if (input[i].kind == IDENTIFIER) {
diagReport(DiagReportLevel.ERROR, input[i..(i + 1)],
"This expression is not allowed to contain identifier",
"Here is the illegal identifier")
}
}
return input
}
Macro calling file:
// macro_call.cj
package macro_calling
import std.ast.*
import macro_definition.*
main(): Int64 {
let a = @testDef(1)
let b = @testDef(a)
let c = @testDef(1 + a)
return 0
}
During compilation of the macro-calling file, the following error messages will appear:
error: This expression is not allowed to contain identifier
==> call.cj:9:22:
|
9 | let b = @testDef(a)
| ^ Here is the illegal identifier
|
error: This expression is not allowed to contain identifier
==> call.cj:10:26:
|
10 | let c = @testDef(1 + a)
| ^ Here is the illegal identifier
|
2 errors generated, 2 errors printed.
Using –debug-macro to Output Macro Expansion Results
When using macros for compile-time code generation, errors can be particularly challenging to diagnose. This is a common but often difficult-to-locate issue for developers. The problem arises because the developer’s source code undergoes transformations by macros, resulting in different code fragments. The compiler’s error messages are based on the final macro-generated code, which does not directly correspond to the original source.
To address this, Cangjie macros provide a debug mode. In this mode, developers can inspect the complete macro-expanded code from the debug files generated by the compiler, as shown below.
Macro definition file:
macro package define
import std.ast.*
public macro Outer(input: Tokens): Tokens {
let messages = getChildMessages("Inner")
let getTotalFunc = quote(public func getCnt() {
)
for (m in messages) {
let identName = m.getString("identifierName")
getTotalFunc.append(Token(TokenKind.IDENTIFIER, identName))
getTotalFunc.append(quote(+))
}
getTotalFunc.append(quote(0))
getTotalFunc.append(quote(}))
let funcDecl = parseDecl(getTotalFunc)
let decl = (parseDecl(input) as ClassDecl).getOrThrow()
decl.body.decls.add(funcDecl)
return decl.toTokens()
}
public macro Inner(input: Tokens): Tokens {
assertParentContext("Outer")
let decl = parseDecl(input)
setItem("identifierName", decl.identifier.value)
return input
}
Macro calling file demo.cj:
import define.*
@Outer
class Demo {
@Inner var state = 1
@Inner var cnt = 42
}
main(): Int64 {
let d = Demo()
println("${d.getCnt()}")
return 0
}
When compiling the file that uses macros, add the --debug-macro option to enable Cangjie macro’s debug mode.
cjc --debug-macro demo.cj --import-path ./target
Note:
If using the Cangjie
CJPMproject manager for compilation, add the--debug-macrocompilation option in thecjpm.tomlconfiguration file to enable macro debug mode.compile-option = "--debug-macro"
In debug mode, a temporary file demo.cj.macrocall will be generated, containing the macro-expanded code as follows:
// demo.cj.macrocall
/* ===== Emitted by MacroCall @Outer in demo.cj:3:1 ===== */
class Demo {
var state = 1
var cnt = 42
public func getCnt() {
state + cnt + 0
}
}
/* ===== End of the Emit ===== */
If the expanded macro code contains semantic errors, the compiler’s error messages will trace back to the specific line and column numbers in the expanded code. The debug mode of Cangjie macros has the following considerations:
-
The debug mode of macros will rearrange the source code’s line and column information and is not suitable for certain special line-breaking scenarios. For example:
// before expansion @M{} - 2 // macro M returns 2 // after expansion // ===== Emitted by Macro M at line 1 === 2 // ===== End of the Emit ===== - 2These cases where line breaks alter the semantics should not use debug mode.
-
Debugging macro calls within macro definitions is not supported and will result in compilation errors.
public macro M(input: Tokens) { let a = @M2(1+2) // M2 is inside macro M, not suitable for debug mode. return input + quote($a) } -
Debugging macros with parentheses is not supported.
// main.cj main() { // For macros with parentheses, newlines introduced by debug mode will change the semantics // of the expression, so it is not suitable for debug mode. let t = @M(1+2) }
Macro Package Definition and Import
In the Cangjie language, macro definitions must be placed within a package declared by macro package. A package scoped by macro package only allows macro definitions to be externally visible, while other declarations remain package-private.
Note:
Re-exported declarations are also permitted to be externally visible. For concepts related to package management and re-exporting, refer to the Package Import chapter.
// file define.cj
macro package define // Compiling define.cjo with macro attribute
import std.ast.*
public func A() {} // Error: Macro packages disallow externally visible non-macro definitions. This will raise an error.
public macro M(input: Tokens): Tokens { // Macro M is externally visible
return input
}
It is important to note that within a macro package, declarations from other macro packages and non-macro packages can be re-exported. In non-macro packages, only declarations from non-macro packages are allowed to be re-exported.
Refer to the following examples:
-
Define macro
M1in macro package Amacro package A import std.ast.* public macro M1(input: Tokens): Tokens { return input }The compilation command is as follows:
cjc A.cj --compile-macro -
Define a public function
f1in non-macro package B. Note that symbols frommacro packagecannot be re-exported in a non-macro package.package B // public import A.* // Error: Re-exporting a macro package in a regular package is not allowed. public func f1(input: Int64): Int64 { return input }The compilation command is as follows. Here, the
--output-typeoption is used to compile package B into a dynamic library. For details about cjc compilation options, refer to the “Appendix > cjc Compilation Options” chapter.cjc B.cj --output-type=dylib -o libB.so -
Define macro
M2in macro package C, which depends on content from packages A and B. Observe that amacro packagecan re-export symbols from bothmacro packageand non-macro package.macro package C public import A.* // Correct: Macro packages allow re-exporting within a macro package. public import B.* // Correct: Non-macro packages are also allowed to be re-exported in a macro package. import std.ast.* public macro M2(input: Tokens): Tokens { return @M1(input) + Token(TokenKind.NL) + quote(f1(1)) }The compilation command is as follows. Note that explicit linking of package B’s dynamic library is required:
cjc C.cj --compile-macro -L. -lB -
Use macro
M2inmain.cjimport C.* main() { @M2(let a = 1) }The compilation command is as follows:
cjc main.cj -o main -L. -lBThe expanded result of macro
M2inmain.cjis as follows:import C.* main() { let a = 1 f1(1) }
As seen in main.cj, the symbol f1 from package B appears. Macro authors can re-export symbols from package B within package C, allowing macro users to correctly compile macro-expanded code by simply importing the macro package. If main.cj only imports the macro symbol using import C.M2, an undeclared identifier 'f1' error will occur.
Built-in Compilation Tags
The Cangjie language provides several predefined compilation tags that can be used to control the compilation behavior of the Cangjie compiler.
Source Location
Cangjie offers several built-in compilation tags for obtaining source code locations during compilation.
@sourcePackage()expands to aStringtype literal containing the package name of the source file where this tag is located.@sourceFile()expands to aStringtype literal containing the filename of the source file where this tag is located.@sourceLine()expands to anInt64type literal containing the line number in the source file where this tag is located.
These compilation tags can be used within any expression as long as they comply with type-checking rules. Examples:
func test1() {
let s: String = @sourceFile() // The value of `s` is the current source file name
}
func test2(n!: Int64 = @sourceLine()) { /* at line 5 */
// The default value of `n` is the source file line number of the definition of `test2`
println(n) // print 5
}
Conditional Compilation
Conditional compilation uses the @When tag, which is a technique for selectively compiling different code segments based on specific conditions within a program. The primary applications of conditional compilation include:
- Platform Adaptation: Supports selective code compilation based on the current compilation environment to achieve cross-platform compatibility.
- Feature Selection: Enables flexible configuration by selectively compiling code according to different requirements.
- Debugging Support: Facilitates compiling debugging-related code in debug mode to enhance program performance and security. For example, compiling debug information or logging-related code in debug mode while excluding it from release builds.
- Performance Optimization: Allows selective code compilation based on predefined conditions to improve program performance.
For detailed information about conditional compilation, please refer to the Conditional Compilation chapter, which will not be further elaborated here.
@FastNative
To improve performance when interoperating with C, Cangjie provides the @FastNative tag to optimize calls to C functions. Note that @FastNative can only be used with functions declared as foreign.
First, compile the following C program to generate the dynamic library file libcProg.so:
#include <stdio.h>
char* foo()
{
static char str[] = "this is an example";
return str;
}
Cangjie file main.cj:
@FastNative
foreign func foo(): CPointer<Int32>
@FastNative
foreign func printf(fmt: CPointer<Int32>, ...): Int32
main(): Int32 {
unsafe{
let str = foo()
printf(str)
}
}
For specific compilation instructions, please refer to the Appendix. Below is the corresponding compilation command for this example:
cjc -L . -lcProg ./main.cj
After executing the above command to compile main.cj, an executable file main is generated, with the following execution result:
this is an example
When using @FastNative to modify foreign functions, developers must ensure that the corresponding C functions meet the following two requirements:
- The overall execution time of the function should not be too long. For example: the function must not contain large loops; it must not exhibit blocking behavior, such as calling
sleep,wait, etc. - The function must not call Cangjie methods internally.
@Frozen
The @Frozen annotation can be used to mark functions and properties whose internal implementation is guaranteed not to change in future versions. It signals the developer’s commitment to the stability of that function/property across releases. Functions and properties annotated with @Frozen must not change their signatures or bodies in subsequent versions. Under the same compiler and compilation options, the generated artifacts for the function or property must therefore remain identical between versions.
The @Frozen annotation can only be applied to:
- All kinds of function definitions except local (nested) functions, including global functions, member functions, constructors, primary constructors, and destructors.
- All kinds of member property definitions.
Usage example:
@Frozen
public func test(): Unit {}
public class testClass {
@Frozen
public func testFunc(): Unit {}
@Frozen
public prop testProp: Unit {
get() {}
}
}
@Attribute
Cangjie internally provides the @Attribute tag, allowing developers to set attribute values for declarations using the built-in @Attribute to mark them. Attribute values can be either identifier or string types. Below is a simple example where the variable cnt is given an identifier-type attribute State, and the variable bcnt is given a string-type attribute "Binding".
@Attribute[State] var cnt = 0 // identifier
@Attribute["Binding"] var bcnt = 0 // string
Additionally, the standard library std.ast package provides the getAttrs() method to retrieve a node’s attributes and the hasAttr(attrs: String) method to check whether a node has a specific attribute. Here is a concrete example.
Macro definition:
public macro Component(input: Tokens): Tokens {
var varDecl = parseDecl(input)
if (varDecl.hasAttr("State")) { // Returns true if the node is marked with the "State" attribute, otherwise false
var attrs = varDecl.getAttrs() // Returns a set of Tokens
println(attrs[0].value)
}
return input
}
Macro invocation:
@Component(
@Attribute[State] var cnt = 0
)
@Deprecated
@Deprecated indicates that an API is deprecated. While it remains temporarily usable, it will be removed or changed in the future, and developers are advised not to use it. Example:
@Deprecated["Use boo instead", since: "1.3.4"]
func foo() {}
main() {
foo()
}
The compiler will issue a warning during compilation:
warning: function 'foo' is deprecated since 1.3.4. Use boo instead
==> file.cj:5:5:
|
5 | foo()
| ^^^ deprecated
|
# note: this warning can be suppressed by setting the compiler option `-Woff deprecated`
1 warning generated, 1 warning printed.
The @Deprecated custom macro can be applied to the following declarations:
- Classes, interfaces, structs, enums, enum constructors
- Top-level (global) functions or variables
- Static or non-static member functions, member variables, properties, property setters
- Operator functions
- Extension member functions, static functions, properties, or property setters
- Foreign functions or functions declared within foreign blocks
- Constructors and primary constructors
- Abstract functions and properties
- Type aliases (including associated types)
- Named parameters in functions with default arguments
constvariables and functions- Macro definitions
- Annotation classes
@Deprecated Parameters
message: String- Describes why the declaration is deprecated and how to migrate.since!: ?String- The version in which the deprecation occurred.strict!: Bool- Defaults tofalse, triggering a warning at call sites of the marked API. If set totrue, it triggers a compilation error.
@Deprecated["Use Macro2", since: "1990", strict: true]
public macro Macro(input: Tokens): Tokens {
return input
}
Practical Examples
Fast Exponentiation Calculation
Demonstrates the use of macros for compile-time evaluation to generate optimized code through a simple example. When calculating the power n ^ e, if e is a (relatively large) integer, the computation can be accelerated by repeatedly squaring (instead of iterative multiplication). This algorithm can be directly implemented using a while loop, for example:
func power(n: Int64, e: Int64) {
var result = 1
var vn = n
var ve = e
while (ve > 0) {
if (ve % 2 == 1) {
result *= vn
}
ve /= 2
if (ve > 0) {
vn *= vn
}
}
result
}
However, this implementation requires analyzing the value of e each time, with multiple checks and updates to ve within loops and conditional statements. Additionally, the implementation only supports cases where n is of type Int64. To support other types of n, the issue of expressing result = 1 must also be addressed. If the specific value of e is known in advance, the code can be written more simply. For example, if e is known to be 10, the entire loop can be unrolled as follows:
func power_10(n: Int64) {
var vn = n
vn *= vn // vn = n ^ 2
var result = vn // result = n ^ 2
vn *= vn // vn = n ^ 4
vn *= vn // vn = n ^ 8
result *= vn // result = n ^ 10
result
}
Of course, manually writing this code is tedious. The goal is to automatically generate this code given the value of e. Macros can achieve this. The usage example is as follows:
public func power_10(n: Int64) {
@power[10](n)
}
The macro-expanded code is (from the .macrocall file):
public func power_10(n: Int64) {
/* ===== Emitted by MacroCall @power in main.cj:20:5 ===== */
var _power_vn = n
_power_vn *= _power_vn
var _power_result = _power_vn
_power_vn *= _power_vn
_power_vn *= _power_vn
_power_result *= _power_vn
_power_result
/* ===== End of the Emit ===== */
}
Below is the implementation of the @power macro.
macro package define
import std.ast.*
import std.convert.*
public macro power(attrib: Tokens, input: Tokens) {
let attribExpr = parseExpr(attrib)
if (let Some(litExpr) <- (attribExpr as LitConstExpr)) {
let lit = litExpr.literal
if (lit.kind != TokenKind.INTEGER_LITERAL) {
diagReport(DiagReportLevel.ERROR, attrib,
"Attribute must be integer literal",
"Expected integer literal")
}
var n = Int64.parse(lit.value)
var result = quote(var _power_vn = $(input)
)
var flag = false
while (n > 0) {
if (n % 2 == 1) {
if (!flag) {
result += quote(var _power_result = _power_vn
)
flag = true
} else {
result += quote(_power_result *= _power_vn
)
}
}
n /= 2
if (n > 0) {
result += quote(_power_vn *= _power_vn
)
}
}
result += quote(_power_result)
return result
} else {
diagReport(DiagReportLevel.ERROR, attrib,
"Attribute must be integer literal",
"Expected integer literal")
}
return input
}
The explanation of this code is as follows:
- First, confirm that the input attribute
attribis an integer literal; otherwise, report an error viadiagReport. Parse this literal into an integern. - Let
resultbe the currently accumulated output code, starting with the declarationvar _power_vn. To avoid variable name conflicts, use the less likely to conflict name_power_vn. - Enter the while loop, where the boolean variable
flagindicates whethervar _power_resulthas been initialized. The rest of the code structure is similar to the implementation of thepowerfunction shown earlier, but the difference is that the while loop and if conditions are used at compile time to determine what code to generate, rather than making these judgments at runtime. Then generate code consisting of appropriate combinations of_power_result *= _power_vnand_power_vn *= _power_vn. - Finally, add the code to return
_power_resultand return this code as the macro’s output value.
Place this code in the macros/power.cj file and add the following test in main.cj:
import define.*
public func power_10(n: Int64) {
@power[10](n)
}
main() {
let a = 3
println(power_10(a))
}
The output is:
59049
Memoize Macro
Memoization is a common technique in dynamic programming algorithms. It stores the results of already computed subproblems so that when the same subproblem appears again, the result can be directly retrieved from the table, avoiding redundant computations and improving algorithm efficiency.
Typically, using memoization requires developers to manually implement storage and retrieval functionality. With macros, this process can be automated. The macro’s effect is as follows:
@Memoize[true]
func fib(n: Int64): Int64 {
if (n == 0 || n == 1) {
return n
}
return fib(n - 1) + fib(n - 2)
}
main() {
let start = DateTime.now()
let f35 = fib(35)
let end = DateTime.now()
println("fib(35): ${f35}")
println("execution time: ${(end - start).toMicroseconds()} us")
}
In the above code, the fib function is implemented using simple recursion. Without the @Memoize[true] annotation, the function’s runtime would grow exponentially with n. For example, if the @Memoize[true] line is removed or true is changed to false in the above code, the main function’s output would be:
fib(35): 9227465
execution time: 199500 us
Restoring @Memoize[true], the output becomes:
fib(35): 9227465
execution time: 78 us
The same answer with significantly reduced computation time demonstrates that @Memoize indeed implements memoization.
To understand the principle of @Memoize, the macro-expanded result of the fib function is shown below (from the .macrocall file, but formatted for better readability).
import std.collection.*
var memoizeFibMap = HashMap<Int64, Int64>()
func fib(n: Int64): Int64 {
if (memoizeFibMap.contains(n)) {
return memoizeFibMap.get(n).getOrThrow()
}
let memoizeEvalResult = { =>
if (n == 0 || n == 1) {
return n
}
return fib(n - 1) + fib(n - 2)
}()
memoizeFibMap.add(n, memoizeEvalResult)
return memoizeEvalResult
}
The execution flow of the above code is as follows:
- First, define
memoizeFibMapas a hash table fromInt64toInt64, where the firstInt64corresponds to the type offib’s single parameter, and the secondInt64corresponds tofib’s return type. - Next, in the function body, check if the input parameter exists in
memoizeFibMap; if so, immediately return the stored value. Otherwise, use the original function body offibto compute the result. Here, an (parameterless) anonymous function is used so thatfib’s function body requires no changes and can handle any way of exiting thefibfunction (including intermediate returns, returning the last expression, etc.). - Finally, store the computed result in
memoizeFibMapand return the result.
With such a “template,” the implementation of the macro becomes straightforward. The complete code is as follows.
macro package define
import std.ast.*
public macro Memoize(attrib: Tokens, input: Tokens) {
if (attrib.size != 1 || attrib[0].kind != TokenKind.BOOL_LITERAL) {
diagReport(DiagReportLevel.ERROR, attrib,
"Attribute must be a boolean literal (true or false)",
"Expected boolean literal (true or false) here")
}
let memoized = (attrib[0].value == "true")
if (!memoized) {
return input
}
let fd = FuncDecl(input)
if (fd.funcParams.size != 1) {
diagReport(DiagReportLevel.ERROR, fd.lParen + fd.funcParams.toTokens() + fd.rParen,
"Input function to memoize should take exactly one argument",
"Expect only one argument here")
}
let memoMap = Token(TokenKind.IDENTIFIER, "_memoize_" + fd.identifier.value + "_map")
let arg1 = fd.funcParams[0]
return quote(
var $(memoMap) = HashMap<$(arg1.paramType), $(fd.declType)>()
func $(fd.identifier)($(arg1)): $(fd.declType) {
if ($(memoMap).contains($(arg1.identifier))) {
return $(memoMap).get($(arg1.identifier)).getOrThrow()
}
let memoizeEvalResult = { => $(fd.block.nodes) }()
$(memoMap).add($(arg1.identifier), memoizeEvalResult)
return memoizeEvalResult
}
)
}
First, perform validity checks on the attributes and input. The attribute must be a boolean literal; if false, return the input directly. Otherwise, verify that the input can be parsed as a function declaration (FuncDecl) and must contain exactly one parameter. Then, generate the hash table variable, using a name unlikely to cause conflicts. Finally, use the quote template to generate the return code, which includes the hash table variable name, the single parameter’s name and type, and the input function’s return type.
An Extension of the dprint Macro
This section initially used a macro for printing expressions as an example, but that macro could only accept one expression at a time. The goal is to extend this macro to accept multiple expressions separated by commas. Below demonstrates how to use parseExprFragment to achieve this functionality.
The macro implementation is as follows:
macro package define
import std.ast.*
public macro dprint2(input: Tokens) {
let exprs = ArrayList<Expr>()
var index: Int64 = 0
while (true) {
let (expr, nextIndex) = parseExprFragment(input, startFrom: index)
exprs.add(expr)
if (nextIndex == input.size) {
break
}
if (input[nextIndex].kind != TokenKind.COMMA) {
diagReport(DiagReportLevel.ERROR, input[nextIndex..nextIndex+1],
"Input must be a comma-separated list of expressions",
"Expected comma")
}
index = nextIndex + 1 // Skip comma
}
let result = quote()
for (expr in exprs) {
result.append(quote(
print($(expr.toTokens().toString()) + " = ")
println($(expr))
))
}
return result
}
Usage example:
import define.*
main() {
let x = 3
let y = 2
@dprint2(x, y, x + y)
}
Output result:
x = 3
y = 2
x + y = 5
In the macro implementation, a while loop is used to parse each expression sequentially starting from index 0. The variable index stores the current parsing position. Each time parseExprFragment is called, it starts from the current position and returns the parsed position (along with the parsed expression). If the parsed position reaches the end of input, the loop exits. Otherwise, it checks whether the reached position contains a comma - if not, it reports an error and exits; if it is a comma, it skips the comma and starts the next parsing cycle. After obtaining the expression list, each expression is output sequentially.
A Simple DSL
This case demonstrates how to use macros to implement a simple DSL (Domain Specific Language). LINQ (Language Integrated Query) is a component of Microsoft’s .NET framework that provides a unified data query syntax, allowing developers to use SQL-like query statements to manipulate various data sources. Here, we only demonstrate support for the simplest LINQ syntax.
The desired syntax is:
from <variable> in <list> where <condition> select <expression>
Where variable is an identifier, and list, condition, and expression are all expressions. Therefore, the macro implementation strategy involves sequentially extracting the identifier and expressions while verifying that intermediate keywords are correct. Finally, it generates query results composed of the extracted parts.
The macro implementation is as follows:
macro package define
import std.ast.*
public macro linq(input: Tokens) {
let syntaxMsg = "Syntax is \"from <attrib> in <table> where <cond> select <expr>\""
if (input.size == 0 || input[0].value != "from") {
diagReport(DiagReportLevel.ERROR, input[0..1], syntaxMsg,
"Expected keyword \"from\" here.")
}
if (input.size <= 1 || input[1].kind != TokenKind.IDENTIFIER) {
diagReport(DiagReportLevel.ERROR, input[1..2], syntaxMsg,
"Expected identifier here.")
}
let attribute = input[1]
if (input.size <= 2 || input[2].value != "in") {
diagReport(DiagReportLevel.ERROR, input[2..3], syntaxMsg,
"Expected keyword \"in\" here.")
}
var index: Int64 = 3
let (table, nextIndex) = parseExprFragment(input, startFrom: index)
if (nextIndex == input.size || input[nextIndex].value != "where") {
diagReport(DiagReportLevel.ERROR, input[nextIndex..nextIndex+1], syntaxMsg,
"Expected keyword \"where\" here.")
}
index = nextIndex + 1 // Skip 'where'
let (cond, nextIndex2) = parseExprFragment(input, startFrom: index)
if (nextIndex2 == input.size || input[nextIndex2].value != "select") {
diagReport(DiagReportLevel.ERROR, input[nextIndex2..nextIndex2+1], syntaxMsg,
"Expected keyword \"select\" here.")
}
index = nextIndex2 + 1 // Skip 'select'
let (expr, nextIndex3) = parseExprFragment(input, startFrom: index)
return quote(
for ($(attribute) in $(table)) {
if ($(cond)) {
println($(expr))
}
}
)
}
Usage example:
import define.*
main() {
@linq(from x in 1..=10 where x % 2 == 1 select x * x)
}
This example filters odd numbers from the list 1, 2, … 10 and returns the squares of all odd numbers. The output result is:
1
9
25
49
81
As can be seen, a significant portion of the macro implementation is dedicated to parsing and validating input Tokens, which is crucial for the macro’s usability. Actual LINQ languages (and most DSLs) have more complex syntax and require a complete parsing mechanism to determine what to parse next by identifying different keywords or connectors.
Dynamic Features
This chapter introduces the dynamic features of Cangjie, which enable developers to implement certain functionalities more elegantly. The dynamic features of Cangjie primarily include reflection.
Basic Introduction to Cangjie Reflection
Reflection refers to a mechanism that allows a program to access, inspect, and modify its own state or behavior.
The dynamic feature of reflection offers the following advantages:
- Enhances program flexibility and extensibility.
- Enables programs to determine the types of various objects at runtime and perform operations such as enumeration and invocation of their members.
- Allows the creation of new types at runtime without the need for hardcoding in advance.
However, reflection calls typically exhibit lower performance compared to direct calls. Therefore, the reflection mechanism is mainly applied to system frameworks that require high flexibility and extensibility.
How to Obtain TypeInfo
For Cangjie’s reflection feature, it is essential to understand the TypeInfo type. This core type records type information for any given type and defines methods for retrieving type information, setting values, etc. To facilitate user operations, Cangjie also provides a series of information types such as ClassTypeInfo, PrimitiveTypeInfo, and ParameterInfo.
Three static of methods can be used to generate TypeInfo information classes.
public class TypeInfo {
public static func of(a: Any): TypeInfo
public static func of(a: Object): ClassTypeInfo
public static func of<T>(): TypeInfo
}
When the of function with parameters of type Any or Object is used, the output is the runtime type information of the instance. The of function with a generic parameter returns the static type information of the passed parameter. Both methods produce identical information, but there is no guarantee that they will correspond to the same object.
For example, reflection can be used to obtain type information for a custom type.
import std.reflect.*
class Foo {}
main() {
let a: Foo = Foo()
let info: TypeInfo = TypeInfo.of(a)
let info2: TypeInfo = TypeInfo.of<Foo>()
println(info)
println(info2)
}
Compiling and executing the above code will output:
default.Foo
default.Foo
Additionally, TypeInfo provides a static function get, which retrieves TypeInfo by passing in a type name.
public class TypeInfo {
public static func get(qualifiedName: String): TypeInfo
}
Note that the input parameter must conform to the fully qualified pattern rule of module.package.type. For compiler-preloaded types, including those in the core package and built-in compiler types such as primitive type, Option, Iterable, etc., the search string should directly use the type name without the package or module prefix. If the runtime cannot find an instance of the corresponding type, an InfoNotFoundException will be thrown.
let t1: TypeInfo = TypeInfo.get("Int64")
let t1: TypeInfo = TypeInfo.get("default.Foo")
let t2: TypeInfo = TypeInfo.get("std.socket.TcpSocket")
let t3: TypeInfo = TypeInfo.get("net.http.ServerBuilder")
This method cannot be used to obtain an uninstantiated generic type.
import std.collection.*
import std.reflect.*
class A<T> {
A(public let t: T) {}
}
class B<T> {
B(public let t: T) {}
}
main() {
let aInfo: TypeInfo = TypeInfo.get("default.A<Int64>")// Error,`default.A<Int64>` is not instantiated,will throw InfoNotFoundException
let b: B<Int64> = B<Int64>(1)
let bInfo: TypeInfo = TypeInfo.get("default.B<Int64>")// OK `default.B<Int64>` has been instantiated.
}
How to Use Reflection to Access Members
Once the corresponding type information class, i.e., TypeInfo, is obtained, its respective interfaces can be used to access instance members and static members of the corresponding class. Additionally, the ClassTypeInfo subclass of TypeInfo provides interfaces for accessing the class’s public constructors, member variables, properties, and functions. Cangjie’s reflection is designed to only access public members within a type, meaning members modified as private or protected are invisible in reflection.
For example, to retrieve and modify an instance member variable of a class at runtime:
import std.reflect.*
public class Foo {
public static var param1 = 20
public var param2 = 10
}
main(): Unit{
let obj = Foo()
let info = TypeInfo.of(obj)
let staticVarInfo = info.getStaticVariable("param1")
let instanceVarInfo = info.getInstanceVariable("param2")
println("Initial values of member variables")
print("Static member variable ${staticVarInfo} of Foo = ")
println((staticVarInfo.getValue() as Int64).getOrThrow())
print("Instance member variable ${instanceVarInfo} of obj = ")
println((instanceVarInfo.getValue(obj) as Int64).getOrThrow())
println("Modifying member variables")
staticVarInfo.setValue(8)
instanceVarInfo.setValue(obj, 25)
print("Static member variable ${staticVarInfo} of Foo = ")
println((staticVarInfo.getValue() as Int64).getOrThrow())
print("Instance member variable ${instanceVarInfo} of obj = ")
println((instanceVarInfo.getValue(obj) as Int64).getOrThrow())
return
}
Compiling and executing the above code will output:
Initial values of member variables
Static member variable static param1: Int64 of Foo = 20
Instance member variable param2: Int64 of obj = 10
Modifying member variables
Static member variable static param1: Int64 of Foo = 8
Instance member variable param2: Int64 of obj = 25
Similarly, properties can be inspected and modified via reflection.
import std.reflect.*
public class Foo {
public let _p1: Int64 = 1
public prop p1: Int64 {
get() { _p1 }
}
public var _p2: Int64 = 2
public mut prop p2: Int64 {
get() { _p2 }
set(v) { _p2 = v }
}
}
main(): Unit{
let obj = Foo()
let info = TypeInfo.of(obj)
let instanceProps = info.instanceProperties.toArray()
println("Instance member properties of obj include ${instanceProps}")
let PropInfo1 = info.getInstanceProperty("p1")
let PropInfo2 = info.getInstanceProperty("p2")
println((PropInfo1.getValue(obj) as Int64).getOrThrow())
println((PropInfo2.getValue(obj) as Int64).getOrThrow())
if (PropInfo1.isMutable()) {
PropInfo1.setValue(obj, 10)
}
if (PropInfo2.isMutable()) {
PropInfo2.setValue(obj, 20)
}
println((PropInfo1.getValue(obj) as Int64).getOrThrow())
println((PropInfo2.getValue(obj) as Int64).getOrThrow())
return
}
Compiling and executing the above code will output:
Instance member properties of obj include [prop p1: Int64, mut prop p2: Int64]
1
2
1
20
Function calls can also be made via the reflection mechanism.
import std.reflect.*
public class Foo {
public static func f1(v0: Int64, v1: Int64): Int64 {
return v0 + v1
}
}
main(): Unit {
var num = 0
let intInfo = TypeInfo.of<Int64>()
let funcInfo = TypeInfo.of<Foo>().getStaticFunction("f1", intInfo, intInfo)
num = (funcInfo.apply(TypeInfo.of<Foo>(), [1, 1]) as Int64).getOrThrow()
println(num)
}
Compiling and executing the above code will output:
2
Annotations
Cangjie provides several built-in compilation markers to support special case handling.
Built-in Compilation Markers for Integer Overflow Handling Strategies
Cangjie offers three built-in compilation markers to control integer overflow handling strategies: @OverflowThrowing, @OverflowWrapping, and @OverflowSaturating. These markers can currently only be applied to function declarations and affect integer operations and type conversions within the function. They correspond to the following three overflow handling strategies:
-
Throwing Exceptions (throwing): Throws an exception when integer overflow occurs.
@OverflowThrowing func add(a: Int8, b: Int8){ return a + b } main() { add(100,29) /* Mathematically, 100 + 29 equals 129, * which causes an upper overflow in Int8's range, * resulting in an exception being thrown */ }Note: For scenarios where integer overflow behavior is set to throwing, if the overflow can be detected at compile time, the compiler will directly report an error.
@OverflowThrowing main() { let res: Int8 = Int8(100) + Int8(29) // Error, arithmetic operation '+' overflow // Mathematically, 100 + 29 equals 129, causing an upper overflow in Int8's range; the compiler detects and reports this let con: UInt8 = UInt8(-132) // Error, integer type conversion overflow /* -132 causes a lower overflow in UInt8's range, * resulting in an exception being thrown */ } -
Wrapping (wrapping): When the result of an integer operation exceeds the representable range of the receiving memory space, the excess bits are truncated.
@OverflowWrapping main() { let res: Int8 = Int8(105) * Int8(4) /* Mathematically, 105 * 4 equals 420, * whose binary representation is 1 1010 0100, * exceeding the 8-bit memory space for the result. * The truncated result is represented as 1010 0100 in binary, * corresponding to the signed integer -92 */ let temp: Int16 = Int16(-132) let con: UInt8 = UInt8(temp) /* -132's binary representation is 1111 1111 0111 1100, * exceeding the 8-bit memory space for the result. * The truncated result is represented as 0111 1100 in binary, * corresponding to the signed integer 124 */ } -
Saturating (saturating): When integer overflow occurs, the result is set to the extreme value of the corresponding fixed precision.
@OverflowSaturating main() { let res: Int8 = Int8(-100) - Int8(45) /* Mathematically, -100 - 45 equals -145, * which causes a lower overflow in Int8's range, * so Int8's minimum value -128 is chosen as the result */ let con: Int8 = Int8(1024) /* 1024 causes an upper overflow in Int8's range, * so Int8's maximum value 127 is chosen as the result */ }
By default (i.e., when no such built-in compilation marker is applied), the throwing exception (@OverflowThrowing) strategy is used.
In practice, the overflow strategy should be chosen based on business requirements. For example, to implement a secure operation on Int32 where the calculation result must mathematically match the computation process, the throwing exception strategy should be used.
Counterexample:
// The result is truncated
@OverflowWrapping
func operation(a: Int32, b: Int32): Int32 {
a + b // No exception will be thrown when overflow occurs
}
This incorrect example uses the wrapping overflow strategy. For instance, when the parameters a and b are large enough to cause overflow, the result will be truncated, leading to a mismatch between the function’s return value and the mathematical expression a + b.
Correct Example:
// Secure
@OverflowThrowing
func operation(a: Int32, b: Int32): Int32 {
a + b
}
main(): Int64 {
try {
operation(Int32.Max, 1)
} catch (e: ArithmeticException) {
println(e.message)
//Handle error
}
0
}
This correct example uses the throwing exception strategy. When the parameters a and b cause integer overflow, the operation function throws an exception.
The following table summarizes mathematical operators that may cause integer overflow.
| Operator | Overflow | Operator | Overflow | Operator | Overflow | Operator | Overflow |
|---|---|---|---|---|---|---|---|
+ | Y | -= | Y | << | N | < | N |
- | Y | *= | Y | >> | N | > | N |
* | Y | /= | Y | & | N | >= | N |
/ | Y | %= | N | | | N | <= | N |
% | N | <<= | N | ^ | N | == | N |
++ | Y | >>= | N | **= | Y | ||
-- | Y | &= | N | ! | N | ||
= | N | |= | N | != | N | ||
+= | Y | ^= | N | ** | Y |
Test Framework Built-in Compilation Markers
When using mocks in tests, if the mocked object involves static or top-level declarations, the test framework’s built-in compilation marker @EnsurePreparedToMock must be used to instruct the compiler to prepare these declarations for mocking.
This marker can only be applied to lambda expressions where the last expression calls a static or top-level declaration. The compiler will then prepare this declaration for mocking.
Example:
package prod
public func test(a: String, b: String): String {
a + b
}
package test
import prod.*
import std.unittest.mock.*
@Test
public class TestA {
@TestCase
func case1(): Unit {
{ =>
let matcher0 = Matchers.eq("z")
let matcher1 = Matchers.eq("y")
let stubCall = @EnsurePreparedToMock { => return(test(matcher0.value(), matcher1.value())) }
ConfigureMock.stubFunction(stubCall,[matcher0.withDescription(#"eq("z")"#), matcher1.withDescription(#"eq("y")"#)], Option<String>.None, "test", #"test("z", "y")"#, 15)
}().returns("mocked value")
println(test("z", "y")) // prints "mocked value"
}
}
In this example, ConfigureMock.stubFunction registers a stub for the function test, and returns sets the stub’s return value.
Note:
Typically, the standard library’s mock interfaces should be used to define mock declarations. Direct use of
@EnsurePreparedToMockis discouraged unless necessary. Standard library functions internally use this marker when needed.
Constraints for using @EnsurePreparedToMock:
- Only allowed when compiling with test and mock-related options (
--test/--test-onlyand--mock=on/--mock=runtime-error). - Can only be applied to lambdas with a suitable last expression.
- The lambda’s last expression must be a call, member access, or reference expression involving:
- Top-level functions or variables;
- Static functions, properties, or fields;
- Foreign declarations;
- Not local functions or variables;
- Non-private declarations;
- Not const expressions or declarations;
- Must be from a package built in mock mode.
Custom Annotations
Custom annotations allow reflection (see Reflection Chapter) to retrieve additional metadata beyond type information, enabling more complex logic.
Developers can create custom annotations by marking a class with @Annotation. The class must not be abstract, open, or sealed, and must provide at least one const init function; otherwise, the compiler will report an error.
The following example defines a custom annotation @Version and applies it to classes A, B, and C. In main, reflection is used to retrieve and print the @Version annotation information.
package pkg
import std.reflect.TypeInfo
@Annotation
public class Version {
let code: String
const init(code: String) {
this.code = code
}
}
@Version["1.0"]
class A {}
@Version["1.1"]
class B {}
main() {
let objects = [A(), B()]
for (obj in objects) {
let annOpt = TypeInfo.of(obj).findAnnotation<Version>()
if (let Some(ann) <- annOpt) {
println(ann.code)
}
}
}
Compiling and running this code outputs:
1.0
1.1
Annotation information must be generated at compile time and bound to the type. Custom annotations must be instantiated using const init with valid arguments. The annotation declaration syntax is similar to macro declarations, where the [] brackets must contain const expressions in order or named parameter rules (see Constant Evaluation Chapter). For annotation types with a no-argument const init, the brackets can be omitted.
The following example defines a custom annotation @Marked with a no-argument const init. Both @Marked and @Marked[] are valid usages.
package pkg
import std.reflect.TypeInfo
@Annotation
public class Marked {
const init() {}
}
@Marked
class A {}
@Marked[]
class B {}
main() {
if (TypeInfo.of(A()).findAnnotation<Marked>().isSome()) {
println("A is Marked")
}
if (TypeInfo.of(B()).findAnnotation<Marked>().isSome()) {
println("B is Marked")
}
}
Compiling and running this code outputs:
A is Marked
B is Marked
The same annotation class cannot be applied multiple times to the same target (i.e., no duplicate annotations).
@Marked
@Marked // Error
class A {}
````Annotation` is not inherited, therefore a type's annotation metadata only comes from the annotations declared during its definition. If annotation metadata from a parent type is needed, developers must manually query it using reflection interfaces.
In the following example, `A` is annotated with `@Marked`, `B` inherits from `A`, but `B` does not inherit `A`'s annotation.
<!-- verify -->
```cangjie
package pkg
import std.reflect.TypeInfo
@Annotation
public class Marked {
const init() {}
}
@Marked
open class A {}
class B <: A {}
main() {
if (TypeInfo.of(A()).findAnnotation<Marked>().isSome()) {
println("A is Marked")
}
if (TypeInfo.of(B()).findAnnotation<Marked>().isSome()) {
println("B is Marked")
}
}
When compiling and executing the above code, the output is:
A is Marked
Custom annotations can be applied to type declarations (class, struct, enum, interface), parameters in member functions/constructors, constructor declarations, member function declarations, member variable declarations, and member property declarations. They can also restrict their applicable locations to prevent misuse by developers. Such annotations need to specify the target parameter when declaring @Annotation, with the parameter type being Array<AnnotationKind>. Here, AnnotationKind is an enum defined in the standard library. When no target is specified, the custom annotation can be used in all the aforementioned locations. When targets are specified, it can only be used in the declared list.
public enum AnnotationKind {
| Type
| Parameter
| Init
| MemberProperty
| MemberFunction
| MemberVariable
}
In the following example, a custom annotation is restricted via target to only be applicable to member functions. Using it in other locations will cause a compilation error.
@Annotation[target: [MemberFunction]]
public class Marked {
const init() {}
}
class A {
@Marked // OK, member function
func marked() {}
}
@Marked // Error, type
class B {}
Cangjie-C Interoperability
To ensure compatibility with existing ecosystems, Cangjie supports calling C functions and also allows C to call Cangjie functions.
Calling C Functions from Cangjie
To call a C function in Cangjie, you need to declare the function using the @C and foreign keywords. However, @C can be omitted when modifying a foreign declaration.
For example, to call C’s rand and printf functions with the following signatures:
// stdlib.h
int rand();
// stdio.h
int printf (const char *fmt, ...);
The corresponding Cangjie code would be:
// declare the function by `foreign` keyword, and omit `@C`
foreign func rand(): Int32
foreign func printf(fmt: CString, ...): Int32
main() {
// call this function by `unsafe` block
let r = unsafe { rand() }
println("random number ${r}")
unsafe {
var fmt = LibC.mallocCString("Hello, No.%d\n")
printf(fmt, 1)
LibC.free(fmt)
}
}
Key points to note:
- The
foreignmodifier indicates an external function declaration. Functions marked withforeigncan only be declared, not implemented. - Parameters and return types of
foreignfunctions must conform to the type mapping between C and Cangjie data types. Refer to Type Mapping for details. - Since C functions may perform unsafe operations, calls to
foreignfunctions must be wrapped in anunsafeblock, otherwise a compilation error will occur. - The
@Cmodifier can only be used withforeignfunction declarations. Using it with other declarations will cause a compilation error. @Ccan only modifyforeignfunctions, non-generic functions in the top-level scope, andstructtypes.foreignfunctions do not support named parameters or default values. Variadic parameters are allowed using...notation, but must appear last in the parameter list. Variadic parameters must satisfy theCTypeconstraint but need not be of the same type.- Although Cangjie (CJNative backend) provides stack expansion capability, since C function stack usage is opaque to Cangjie, FFI calls into C functions still carry a risk of stack overflow (which may cause runtime crashes or undefined behavior). Developers should adjust
cjStackSizeconfiguration based on actual needs.
Examples of invalid foreign declarations:
foreign func rand(): Int32 { // compiler error
return 0
}
@C
foreign var a: Int32 = 0 // compiler error
@C
foreign class A{} // compiler error
@C
foreign interface B{} // compiler error
CFunc
CFunc in Cangjie refers to functions that can be called by C code, which come in three forms:
foreignfunctions modified by@C- Cangjie functions modified by
@C CFunclambda expressions, which differ from regular lambdas in that they cannot capture variables.
// Case 1
foreign func free(ptr: CPointer<Int8>): Unit
// Case 2
@C
func callableInC(ptr: CPointer<Int8>) {
print("This function is defined in Cangjie.")
}
// Case 3
let f1: CFunc<(CPointer<Int8>) -> Unit> = { ptr =>
print("This function is defined with CFunc lambda.")
}
All three forms declare/define functions of type CFunc<(CPointer<Int8>) -> Unit>. CFunc corresponds to C’s function pointer type. This is a generic type where the type parameter represents the CFunc’s parameter and return types. Usage example:
foreign func atexit(cb: CFunc<() -> Unit>): Int32
Like foreign functions, other forms of CFunc must satisfy the CType constraint for parameters and return types, and do not support named parameters or default values.
When called within Cangjie code, CFunc must be invoked in an unsafe context.
Cangjie supports converting a CPointer<T> variable to a concrete CFunc, where CPointer’s type parameter T can be any type satisfying the CType constraint. Example:
main() {
var ptr = CPointer<Int8>()
var f = CFunc<() -> Unit>(ptr)
unsafe { f() } // core dumped when running, because the pointer is nullptr.
}
Note:
Converting a pointer to
CFuncand invoking it is dangerous. Users must ensure the pointer points to a valid function address, otherwise runtime errors will occur.
inout Parameters
When calling CFunc in Cangjie, arguments can be modified with the inout keyword to form pass-by-reference expressions. These expressions have type CPointer<T>, where T is the type of the inout-modified expression.
Pass-by-reference expressions have the following constraints:
- Can only be used at
CFunccall sites. - The modified object’s type must satisfy
CTypebut cannot beCString. - The modified object cannot be defined with
let, nor can it be a literal, input parameter, or other temporary value. - Pointers passed to C via pass-by-reference expressions are only guaranteed valid during the function call. C code should not store these pointers for later use.
inout-modified variables can be top-level variables, local variables, or struct member variables, but cannot be directly or indirectly derived from class instance member variables.
Example:
foreign func foo1(ptr: CPointer<Int32>): Unit
@C
func foo2(ptr: CPointer<Int32>): Unit {
let n = unsafe { ptr.read() }
println("*ptr = ${n}")
}
let foo3: CFunc<(CPointer<Int32>) -> Unit> = { ptr =>
let n = unsafe { ptr.read() }
println("*ptr = ${n}")
}
struct Data {
var n: Int32 = 0
}
class A {
var data = Data()
}
main() {
var n: Int32 = 0
unsafe {
foo1(inout n) // OK
foo2(inout n) // OK
foo3(inout n) // OK
}
var data = Data()
var a = A()
unsafe {
foo1(inout data.n) // OK
foo1(inout a.data.n) // Error, n is derived indirectly from instance member variables of class A
}
}
Note:
The
inoutparameter feature cannot currently be used in macro definitions when using macro expansion features.
unsafe
Interoperability with C introduces many unsafe factors, so Cangjie uses the unsafe keyword to mark unsafe cross-C calls.
Key points about unsafe:
- Can modify functions, expressions, or scopes.
- Functions modified by
@Cmust be called in anunsafecontext. CFunccalls must occur in anunsafecontext.foreignfunction calls in Cangjie must occur in anunsafecontext.- When calling an
unsafe-modified function, the call site must be in anunsafecontext.
Usage example:
foreign func rand(): Int32
@C
func foo(): Unit {
println("foo")
}
var foo1: CFunc<() -> Unit> = { =>
println("foo1")
}
main(): Int64 {
unsafe {
rand() // Call foreign func.
foo() // Call @C func.
foo1() // Call CFunc var.
}
0
}
Note that regular lambdas cannot propagate unsafe attributes. When an unsafe lambda escapes, it can be called directly without an unsafe context without causing compilation errors. When needing to call unsafe functions within a lambda, it’s recommended to make the call within an unsafe block:
unsafe func A(){}
unsafe func B(){
var f = { =>
unsafe { A() } // Avoid calling A() directly without unsafe in a normal lambda.
}
return f
}
main() {
var f = unsafe{ B() }
f()
println("Hello World")
}
Calling Conventions
Calling conventions describe how callers and callees interact (e.g., parameter passing, stack cleanup). Both sides must use the same calling convention. Cangjie uses @CallingConv to represent calling conventions, supporting:
- CDECL: Default calling convention for clang’s C compiler across platforms.
- STDCALL: Calling convention used by Win32 APIs.
C functions called via FFI use CDECL by default when no calling convention is specified. Example calling C’s rand:
@CallingConv[CDECL] // Can be omitted in default.
foreign func rand(): Int32
main() {
println(unsafe { rand() })
}
@CallingConv can only modify foreign blocks, individual foreign functions, and top-level CFunc functions. When modifying a foreign block, it applies the same convention to all functions within.
Usage Guidelines
-
OS Thread-Local Variable Constraints
When interoperating between Cangjie and C, using OS thread-local variables carries risks:
- Thread-local variables include those defined with C’s
thread_localor created viapthread_key_create. - Cangjie has thread scheduling capabilities, where Cangjie threads may be scheduled to any OS thread randomly. Thus calling other languages’ thread-local variables from Cangjie threads is risky.
Example of risky thread-local variable usage:
// C language logic using thread_local static thread_local int64_t count = 0; int64_t getCount() { count++; return count; }foreign func getCount(): Int64 // Cangjie invokes the preceding C language logic spawn { let r1 = unsafe { getCount() } // r1 equals 1 sleep(Duration.second * 10) let r2 = unsafe { getCount() } // r2 may not be equal to 2 } - Thread-local variables include those defined with C’s
-
Thread Binding Constraints
When Cangjie calls C for interop, Cangjie threads may be scheduled to any OS thread randomly. Thread priority and affinity behaviors are not recommended.
-
Synchronization Primitive Guidelines
When Cangjie calls C for interop, the Cangjie thread waits for the interop logic to complete. Long blocking behaviors in other languages are not recommended.
-
Fork Support
If C code called by Cangjie creates child processes via
fork(), Cangjie logic cannot be executed in child processes. Other OS threads in the same process are unaffected. -
Process Exit Considerations
If C code called by Cangjie exits the process, shared resources may be released, potentially causing illegal access errors.
Type Mapping
Basic TypesThe mapping between Cangjie and C language for basic data types follows these general principles
- Cangjie types do not include reference types that point to managed memory;
- Cangjie types and C types share the same memory layout.
For example, some basic type mappings are as follows:
| Cangjie Type | C Type | Size (byte) |
|---|---|---|
Unit | void | 0 |
Bool | bool | 1 |
UInt8 | char | 1 |
Int8 | int8_t | 1 |
UInt8 | uint8_t | 1 |
Int16 | int16_t | 2 |
UInt16 | uint16_t | 2 |
Int32 | int32_t | 4 |
UInt32 | uint32_t | 4 |
Int64 | int64_t | 8 |
UInt64 | uint64_t | 8 |
IntNative | ssize_t | platform dependent |
UIntNative | size_t | platform dependent |
Float32 | float | 4 |
Float64 | double | 8 |
Note:
Types like
intandlonghave platform-dependent sizes, requiring programmers to explicitly specify corresponding Cangjie types. In C interoperation scenarios, similar to C, theUnittype can only be used as a return type inCFuncand as a generic parameter inCPointer.
Cangjie also supports mapping with C’s struct and pointer types.
Structs
For struct types, Cangjie uses @C-annotated struct for correspondence. For example, given this C struct:
typedef struct {
long long x;
long long y;
long long z;
} Point3D;
The corresponding Cangjie type can be defined as:
@C
struct Point3D {
var x: Int64 = 0
var y: Int64 = 0
var z: Int64 = 0
}
If there’s a C function like:
Point3D addPoint(Point3D p1, Point3D p2);
The corresponding Cangjie declaration would be:
foreign func addPoint(p1: Point3D, p2: Point3D): Point3D
@C-annotated structs must satisfy these constraints:
- Member variable types must satisfy the
CTypeconstraint - Cannot implement or extend
interfaces - Cannot be used as associated value types for
enums - Cannot be captured by closures
- Cannot have generic parameters
@C-annotated structs automatically satisfy the CType constraint.
Pointers
For pointer types, Cangjie provides CPointer<T> to correspond to C pointer types, where the generic parameter T must satisfy the CType constraint. For example, the C signature for malloc:
void* malloc(size_t size);
Can be declared in Cangjie as:
foreign func malloc(size: UIntNative): CPointer<Unit>
CPointer supports read/write operations, pointer arithmetic, null checks, and conversion to integer form. Detailed APIs can be found in The Cangjie Programming Language Library API. Read/write and pointer arithmetic are unsafe operations that may cause undefined behavior if performed on invalid pointers, requiring unsafe blocks.
Example usage:
foreign func malloc(size: UIntNative): CPointer<Unit>
foreign func free(ptr: CPointer<Unit>): Unit
@C
struct Point3D {
var x: Int64
var y: Int64
var z: Int64
init(x: Int64, y: Int64, z: Int64) {
this.x = x
this.y = y
this.z = z
}
}
main() {
let p1 = CPointer<Point3D>() // create a CPointer with null value
if (p1.isNull()) { // check if the pointer is null
print("p1 is a null pointer")
}
let sizeofPoint3D: UIntNative = 24
var p2 = unsafe { malloc(sizeofPoint3D) } // malloc a Point3D in heap
var p3 = unsafe { CPointer<Point3D>(p2) } // pointer type cast
unsafe { p3.write(Point3D(1, 2, 3)) } // write data through pointer
let p4: Point3D = unsafe { p3.read() } // read data through pointer
let p5: CPointer<Point3D> = unsafe { p3 + 1 } // offset of pointer
unsafe { free(p2) }
}
Cangjie supports forced type conversion between CPointer types, where both source and target generic parameters must satisfy the CType constraint:
main() {
var pInt8 = CPointer<Int8>()
var pUInt8 = CPointer<UInt8>(pInt8) // CPointer<Int8> convert to CPointer<UInt8>
}
Cangjie also supports converting a CFunc type variable to a concrete CPointer, where the generic parameter can be any CType-satisfying type:
foreign func rand(): Int32
main() {
var ptr = CPointer<Int8>(rand)
}
Warning:
While converting
CFuncto a pointer is generally safe, performingreadorwriteoperations on the converted pointer may cause runtime errors.
Arrays
Cangjie uses VArray to map to C array types. VArray can be used as function parameters and @C struct members. When element type T in VArray<T, $N> satisfies the CType constraint, VArray<T, $N> also satisfies CType.
As function parameter types:
When VArray is used as a CFunc parameter, the function signature can only be CPointer<T> or VArray<T, $N>. When the parameter type is VArray<T, $N>, the argument is still passed as CPointer<T>.
Example:
foreign func cfoo1(a: CPointer<Int32>): Unit
foreign func cfoo2(a: VArray<Int32, $3>): Unit
Corresponding C definitions:
void cfoo1(int *a) { ... }
void cfoo2(int a[3]) { ... }
When calling CFunc, use inout with VArray variables:
var a: VArray<Int32, $3> = [1, 2, 3]
unsafe {
cfoo1(inout a)
cfoo2(inout a)
}
VArray cannot be used as a CFunc return type.
As @C struct members:
When used as @C struct members, VArray has the same memory layout as C structs, requiring identical declared lengths and types:
struct S {
int a[2];
int b[0];
}
In Cangjie:
@C
struct S {
var a = VArray<Int32, $2>(repeat: 0)
var b = VArray<Int32, $0>(repeat: 0)
}
Note:
C allows flexible array members (arrays of unspecified length) as the last struct member. Cangjie doesn’t support mapping structs containing flexible array members.
Strings
For C strings, Cangjie provides the CString type with these member functions:
init(p: CPointer<UInt8>)Construct fromCPointerfunc getChars()Get string address asCPointer<UInt8>func size(): Int64Get string lengthfunc isEmpty(): BoolCheck if empty (returns true for null pointers)func isNotEmpty(): BoolCheck if not empty (returns false for null pointers)func isNull(): BoolCheck for null pointerfunc startsWith(str: CString): BoolCheck prefixfunc endsWith(str: CString): BoolCheck suffixfunc equals(rhs: CString): BoolEquality checkfunc equalsLower(rhs: CString): BoolCase-insensitive equalityfunc subCString(start: UInt64): CStringSubstring from start (new allocation)func subCString(start: UInt64, len: UInt64): CStringSubstring with length (new allocation)func compare(str: CString): Int32Equivalent to C’sstrcmp(this, str)func toString(): StringConvert to Stringfunc asResource(): CStringResourceGet resource representation
Convert String to CString using LibC.mallocCString, remembering to free the CString afterward.
Example:
foreign func strlen(s: CString): UIntNative
main() {
var s1 = unsafe { LibC.mallocCString("hello") }
var s2 = unsafe { LibC.mallocCString("world") }
let t1: Int64 = s1.size()
let t2: Bool = s2.isEmpty()
let t3: Bool = s1.equals(s2)
let t4: Bool = s1.startsWith(s2)
let t5: Int32 = s1.compare(s2)
let length = unsafe { strlen(s1) }
unsafe {
LibC.free(s1)
LibC.free(s2)
}
}
sizeOf/alignOf
Cangjie provides sizeOf and alignOf functions to get memory size and alignment (in bytes) for C-interoperable types:
public func sizeOf<T>(): UIntNative where T <: CType
public func alignOf<T>(): UIntNative where T <: CType
Example:
@C
struct Data {
var a: Int64 = 0
var b: Float32 = 0.0
}
main() {
println(sizeOf<Data>())
println(alignOf<Data>())
}
```When running on a 64-bit machine, the output will be:
```text
16
8
CType
In addition to the types provided in the type mapping section for interfacing with C-side types, Cangjie also offers a CType interface. This interface itself contains no methods and serves as a parent type for all C-interoperable types, facilitating use in generic constraints.
Important notes:
- The
CTypeinterface is an interface type in Cangjie and does not itself satisfy theCTypeconstraint; - The
CTypeinterface cannot be inherited or extended; - The
CTypeinterface does not bypass subtype usage restrictions.
Example usage of CType:
func foo<T>(x: T): Unit where T <: CType {
match (x) {
case i32: Int32 => println(i32)
case ptr: CPointer<Int8> => println(ptr.isNull())
case f: CFunc<() -> Unit> => unsafe { f() }
case _ => println("match failed")
}
}
main() {
var i32: Int32 = 1
var ptr = CPointer<Int8>()
var f: CFunc<() -> Unit> = { => println("Hello") }
var f64 = 1.0
foo(i32)
foo(ptr)
foo(f)
foo(f64)
}
Execution results:
1
true
Hello
match failed
Calling Cangjie Functions from C
Cangjie provides the CFunc type to correspond with C-side function pointer types. C-side function pointers can be passed to Cangjie, and Cangjie can also construct variables corresponding to C function pointers to pass to the C side.
Assume a C library API as follows:
typedef void (*callback)(int);
void set_callback(callback cb);
Correspondingly, in Cangjie this function can be declared as:
foreign func set_callback(cb: CFunc<(Int32) -> Unit>): Unit
Variables of type CFunc can be passed from the C side or constructed in Cangjie. There are two methods to construct CFunc types in Cangjie: one is using functions decorated with @C, and the other is closures marked as CFunc types.
Functions decorated with @C indicate that their function signatures comply with C calling conventions, while their definitions remain in Cangjie. Functions decorated with foreign have their definitions on the C side.
Note:
For both
foreign-decorated functions and@C-decorated functions, it is not recommended to useCJ_(case-insensitive) as a prefix for naming theseCFunctypes, as this may conflict with standard library and runtime symbols internal to the compiler, leading to undefined behavior.
Example:
@C
func myCallback(s: Int32): Unit {
println("handle ${s} in callback")
}
main() {
// the argument is a function qualified by `@C`
unsafe { set_callback(myCallback) }
// the argument is a lambda with `CFunc` type
let f: CFunc<(Int32) -> Unit> = { i => println("handle ${i} in callback") }
unsafe { set_callback(f) }
}
Assuming the C function is compiled into a library named “libmyfunc.so”, the compilation command cjc -L. -lmyfunc test.cj -o test.out should be used to link this library with the Cangjie compiler. This will ultimately generate the desired executable.
Additionally, when compiling C code, please enable the -fstack-protector-all/-fstack-protector-strong stack protection options. Cangjie code inherently includes overflow checks and stack protection. When incorporating C code, it is necessary to ensure the safety of overflows within unsafe blocks.
Compilation Options
Using C interoperability typically requires manually linking C libraries. The Cangjie compiler provides corresponding compilation options.
-
--library-path <value>,-L <value>,-L<value>: Specifies the directory containing the library files to be linked.The path specified by
--library-path <value>will be added to the linker’s library search path. Additionally, paths specified in theLIBRARY_PATHenvironment variable will also be included in the linker’s library search paths, with paths specified via--library-pathtaking precedence over those inLIBRARY_PATH. -
--library <value>,-l <value>,-l<value>: Specifies the library file to be linked.The given library file will be passed directly to the linker. The library filename should follow the format
lib[arg].[extension].
For all compilation options supported by the Cangjie compiler, please refer to “Appendix > cjc Compilation Options”.
Example
This demonstrates how to use C interoperability and the write/read interfaces to assign and read values from a struct.
C code:
// draw.c
#include<stdio.h>
#include<stdint.h>
typedef struct {
int64_t x;
int64_t y;
} Point;
typedef struct {
float x;
float y;
float z;
} Cube;
void drawPicture(Point* point, Cube* cube) {
point->x = 1;
point->y = 2;
printf("Draw Point finished.\n");
printf("Before draw cube\n");
printf("%f\n", cube->x);
printf("%f\n", cube->y);
printf("%f\n", cube->z);
cube->x = 4.4;
cube->y = 5.5;
cube->z = 6.6;
printf("Draw Cube finished.\n");
}
Cangjie code:
// main.cj
@C
struct Point {
var x: Int64 = 0
var y: Int64 = 0
}
@C
struct Cube {
var x: Float32 = 0.0
var y: Float32 = 0.0
var z: Float32 = 0.0
init(x: Float32, y: Float32, z: Float32) {
this.x = x
this.y = y
this.z = z
}
}
foreign func drawPicture(point: CPointer<Point>, cube: CPointer<Cube>): Int32
main() {
let pPoint = unsafe { LibC.malloc<Point>() }
let pCube = unsafe { LibC.malloc<Cube>() }
var cube = Cube(1.1, 2.2, 3.3)
unsafe {
pCube.write(cube)
drawPicture(pPoint, pCube) // in which x, y will be changed
println(pPoint.read().x)
println(pPoint.read().y)
println(pCube.read().x)
println(pCube.read().y)
println(pCube.read().z)
LibC.free(pPoint)
LibC.free(pCube)
}
}
Compilation command for Cangjie code (using CJNative backend as an example):
cjc -L . -l draw ./main.cj
In the compilation command, -L . indicates that the linker should search the current directory for libraries (assuming libdraw.so exists in the current directory), and -l draw specifies the name of the library to link. Upon successful compilation, the default output is a binary file named main. The command to execute the binary is:
LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH ./main
Execution results:
Draw Point finished.
Before draw cube
1.100000
2.200000
3.300000
Draw Cube finished.
1
2
4.400000
5.500000
6.600000
Cross-Platform
Cangjie provides cross-platform development capabilities that address code reuse issues in cross-end development scenarios. Users can differentiate between common code and platform-specific code to share code across different platforms, reducing the time spent on developing and maintaining identical code for different platforms.
Note:
The cross-platform development feature is experimental, and using it may involve risks.
Introduction to Cross-Platform Development Features
Common Code and Platform-Specific Code
The platform-agnostic part of a codebase is referred to as common code, which contains code that can run on all target platforms. This typically includes algorithms, business logic, or other modules that do not depend on specific platform functionalities. The platform-dependent part of a codebase is referred to as platform-specific code, which contains code that can only run on specific platforms. This usually involves calls to operating systems, hardware, or other platform-specific functionalities. Both common code and platform-specific code belong to the same package. Platform files can depend on common files, but common files cannot depend on platform files. Common code is used for sharing across different platforms and can be marked with the common modifier. Platform-specific code is used to distinguish implementations for different platforms and can be marked with the platform modifier. The following rules apply when using the common/specific modifiers:
- The
commonmodifier can only appear in common code, and thespecificmodifier can only appear in platform-specific code. - The
common/specificmodifiers conflict withprivate/const/foreignmodifiers and cannot be used simultaneously.
The following example defines common code and a global function foo:
package cmp
public common func foo(): Unit {
println("I am common")
}
The following example defines platform-specific code and a global function foo:
package cmp
public specific func foo(): Unit {
println("I am platform")
}
More details will be described in the cross-platform development chapter.
Types Supporting Cross-Platform Development Features
Below are detailed usage rules for types that support cross-platform development features.
Global Functions
Global functions support cross-platform features. Users can use the common and specific modifiers for global functions.
A common global function may or may not include an implementation.
common func foo(): Int64
common func goo(a: Int64): Int64 { 1 }
In the above example, two common global functions are defined. The function foo has no function body, while goo includes a function body. Both are valid definitions of common global functions.
common/specific global functions must adhere to the following restrictions:
- A
commonglobal function must specify its return type. - If a
commonglobal function has a complete implementation, aspecificglobal function is not required. If acommonglobal function lacks a complete implementation, aspecificglobal function must be defined. - The function signature of a
platformglobal function must match that of the correspondingcommonglobal function in the same package, meaning parameter types and return types must be consistent. Additionally, the following rules must be satisfied:- The
commonglobal function and the corresponding platform global function must use the same modifiers (e.g.,public,unsafe, etc.), except forcommon/specific. - If the
commonglobal function uses named parameters, the corresponding positions in thespecificglobal function must use parameters with the same names. - If the
commonglobal function includes default values, the corresponding positions in thespecificglobal function must use named parameters with the same names. Default values are not supported inspecificglobal functions. - Each
specificglobal function must match a uniquecommonglobal function. Multiple platform global functions cannot match the samecommonglobal function.
- The
Example:
In a common file, some common global functions can be defined:
// common file
pkg cjmp
common func foo1() // error: 'common' function return type must be specified
common func foo2(): Unit // ok
common func foo3(a!: Int64): Unit // ok
common func foo4(a!: Int64 = 1): Unit // ok
common func foo5(a: Int64): Unit { println("hello word") } // ok
In a platform file, specific global functions can be defined based on the common global functions:
// specific file
pkg cjmp
specific func foo2(a: Int64): Unit {} // error: different arguments
specific func foo2(): Int64 {} // error: different return type
public specific func foo2(): Int64 {} // error: different modifiers
specific func foo2(): Unit {} // ok
specific func foo3(a!: Int64): Unit { println("hello word") } // ok
specific func foo4(a!: Int64 = 1): Unit {} error: 'specific' function parameter can not have default value
specific func foo4(a!: Int64): Unit {} // ok
// common func foo5 has a complete implementation, so no platform definition is needed.
class
Cangjie classes support cross-platform features. Users can use the common and specific modifiers for classes and their members.
// common file
package cmp
common class A {
common var a: Int64 = 1
common init()
common func foo(): Unit
common prop p: Int64
}
// specific file
package cmp
specific class A {
specific var a: Int64 = 2
specific init() {}
specific func foo(): Unit {}
specific prop p: Int64 {
get() { a }
}
}
If a common class exists, there must be a matching specific class with the following requirements:
- The visibility of the
common classandspecific classmust be the same. - The interface implementation of the
common classandspecific classmust be the same. - The inheritance of the
common classandspecific classmust be the same. - A
common open classmatches aspecific open class. - A
common abstract classmatches aspecific abstract class. - A
common sealed abstract classmatches aspecific sealed abstract class.
Class Constructors
Constructors and primary constructors support cross-platform features. The following requirements must be met:
- A
common initcan have a concrete implementation or only a function signature, with the implementation provided byspecific init. - If a
common inithas a complete implementation, thespecific initcan be omitted. Otherwise, a matchingspecific initmust exist. - The visibility of
common initandspecific initmust be the same. - The
specific initimplementation overrides thecommon initimplementation. - The rules for primary constructors are the same as for constructors.
common/specificclasses support regular constructors, which can be defined in eithercommonorspecificclasses.- At least one explicitly defined constructor must exist in the
commonorspecificclass. - Static initializers cannot be modified with
common/specific.
// common file
package cmp
common class A {
common A()
common init(a: String) {}
init(a: Bool) {}
}
// specific file
package cmp
specific class A {
specific A() {}
specific init(a: String) {
println(a)
}
init(a: Int64) {}
}
Class Member Variables
common and specific class member variables must adhere to the following restrictions:
common/specificmember variables must specify their types.- The type, mutability, and visibility of
commonandspecificmember variables must be the same. - A
commonmember variable can be initialized directly or in a constructor, or it can only declare the type and be initialized in thespecificside. common/specificclasses support regular member variables, which can be defined in eithercommonorspecificclasses.- Static member variables of classes do not currently support cross-platform features but will be supported in future versions.
// common file
package cmp
common class A {
common let a: Int64 = 1
common var b: Int64
common var c: Int64
init() {
b = 1
c = 1
}
}
// specific file
package cmp
specific class A {
specific let a: Int64 = 2
specific let b: Int64 = 2
init(input: Int64) { c = input }
}
Class Member Functions
common and specific class member functions must adhere to the following restrictions:
- A
commonmember function can have a concrete implementation or only a function signature, with the implementation provided by aspecificmember function. - If a
commonmember function has a complete implementation, thespecificmember function can be omitted. Otherwise, a matchingspecificmember function must exist. - The parameters, return type, and modifiers (except
common/specific) ofcommonandspecificmember functions must be the same. common/specificclasses support regular member functions, which can be defined in eithercommonorspecificclasses.
// common file
package cmp
common class A {
common func foo1(a: Int64): Unit
common func foo2(): Unit {}
common func foo3(): Unit {}
func foo4() {}
}
// specific file
package cmp
specific class A {
specific func foo1(a: Int64): Unit { println(a) }
specific func foo3(): Unit { println("platform") }
func foo5(): Int64 { 1 }
init() {}
}
Class Properties
common and specific class properties must adhere to the following restrictions:
- A
commonproperty can have a concrete implementation or only a property signature, with the implementation provided by aspecificproperty. - If a
commonproperty has a complete implementation, thespecificproperty can be omitted. Otherwise, a matchingspecificproperty must exist. - The type, visibility, and assignability of
commonandspecificproperties must be the same. common/specificclasses support regular properties, which can be defined in eithercommonorspecificclasses.
// common file
package cmp
common class A {
common prop a: Int64
common prop b: Int64 {
get() { 1 }
}
common prop c: Int64 {
get() { 1 }
}
prop d: Int64 {
get() { 1 }
}
}
// specific file
package cmp
specific class A {
specific prop a: Int64 {
get() { 1 }
}
specific prop c: Int64 {
get() { 2 }
}
prop e: Int64 {
get() { 1 }
}
init() {}
}
Class Inheritance
Inheritance for common/specific classes does not currently support cross-platform features but will be supported in future versions.
struct
Cangjie structs support cross-platform features. Users can use the common and specific modifiers for structs and their members.
// common file
package cmp
common struct A {
common var a: Int64 = 1
common init()
common func foo(): Unit
common prop p: Int64
}
// specific file
package cmp
specific struct A {
specific var a: Int64 = 2
specific init() {}
specific func foo(): Unit {}
specific prop p: Int64 {
get() { a }
}
}
If a common struct exists, there must be a matching specific struct with the following requirements:
- The visibility of the
common structandspecific structmust be the same. - The interface implementation of the
common structandspecific structmust be the same. - The
common structandspecific structmust both be annotated with@Cor neither.
Struct Constructors
Constructors support cross-platform features. The following requirements must be met:
- A
common initcan have a concrete implementation or only a function signature, with the implementation provided byspecific init. - If a
common inithas a complete implementation, thespecific initcan be omitted. Otherwise, a matchingspecific initmust exist. - The visibility of
common initandspecific initmust be the same. - The
specific initimplementation overrides thecommon initimplementation. common/specificstructs support regular constructors, which can be defined in eithercommonorspecificstructs.- Static initializers cannot be modified with
common/specific.
// common file
package cmp
common struct A {
common init(a: String) {}
init(a: Bool) {}
}
// specific file
package cmp
specific struct A {
specific init(a: String) {
println(a)
}
init(a: Int64) {}
}
Struct Member Variables
common and specific struct member variables must adhere to the following restrictions:
common/specificmember variables must specify their types.- The type, mutability, and visibility of
commonandspecificmember variables must be the same. - A
commonmember variable can be initialized directly or in a constructor, or it can only declare the type and be initialized in thespecificside. common/specificstructs support regular member variables, which can be defined in eithercommonorspecificstructs.- Static member variables of structs do not currently support cross-platform features but will be supported in future versions.
// common file
package cmp
common struct A {
common let a: Int64 = 1
common var b: Int64
common var c: Int64
init() {
b = 1
c = 1
}
}
// specific file
package cmp
specific struct A {
specific let a: Int64 = 2
specific let b: Int64 = 2
init(input: Int64) { c = input }
}
Struct Member Functions
common and specific struct member functions must adhere to the following restrictions:
- A
commonmember function can have a concrete implementation or only a function signature, with the implementation provided by aspecificmember function. - If a
commonmember function has a complete implementation, thespecificmember function can be omitted. Otherwise, a matchingspecificmember function must exist. - The parameters, return type, and modifiers (except
common/specific) ofcommonandspecificmember functions must be the same. common/specificstructs support regular member functions, which can be defined in eithercommonorspecificstructs.
// common file
package cmp
common struct A {
common func foo1(a: Int64): Unit
common func foo2(): Unit {}
common func foo3(): Unit {}
func foo4() {}
}
// specific file
package cmp
specific struct A {
specific func foo1(a: Int64): Unit { println(a) }
specific func foo3(): Unit { println("platform") }
func foo5(): Int64 { 1 }
init() {}
}
Struct Properties
common and specific struct properties must adhere to the following restrictions:
- A
commonproperty can have a concrete implementation or only a property signature, with the implementation provided by aspecificproperty. - If a
commonproperty has a complete implementation, thespecificproperty can be omitted. Otherwise, a matchingspecificproperty must exist. - The type, visibility, and assignability of
commonandspecificproperties must be the same. common/specificstructs support regular properties, which can be defined in eithercommonorspecificstructs.
// common file
package cmp
common struct A {
common prop a: Int64
common prop b: Int64 {
get() { 1 }
}
common prop c: Int64 {
get() { 1 }
}
prop d: Int64 {
get() { 1 }
}
}
// specific file
package cmp
specific struct A {
specific prop a: Int64 {
get() { 1 }
}
specific prop c: Int64 {
get() { 2 }
}
prop e: Int64 {
get() { 1 }
}
init() {}
}
enum
Cangjie enums support cross-platform features. Users can use the common and specific modifiers for enums and their members.
// common file
package cmp
common enum A {
| ELEMENT
common func foo(): Unit
common prop p: Int64
}
// specific file
package cmp
specific enum A {
| ELEMENT
specific func foo(): Unit {}
specific prop p: Int64 {
get() { 1 }
}
}
If a common enum exists, there must be a matching specific enum with the following requirements:
- The visibility of the
common enumandspecific enummust be the same. - The interface implementation of the
common enumandspecific enummust be the same. - The corresponding constructors in the
common enumandspecific enummust be of the same type. - If the
common enumis an exhaustive enum, thespecific enummust also be exhaustive. If thecommon enumis non-exhaustive, thespecific enumcan be exhaustive.- For exhaustive enums, the
specific enummust include all constructors from thecommon enumand cannot add new constructors. - For non-exhaustive enums, the
specific enummust include all constructors from thecommon enumand can add new constructors.
- For exhaustive enums, the
// common file
package cmp
common enum A { ELEMENT1 | ELEMENT2 }
common enum B { ELEMENT1 | ELEMENT2 }
common enum C { ELEMENT1 | ELEMENT2 }
common enum D { ELEMENT1 | ELEMENT2 | ... }
common enum E { ELEMENT1 | ELEMENT2 | ... }
// specific file
package cmp
specific enum A { ELEMENT1 | ELEMENT2 } // ok
specific enum B { ELEMENT1 | ELEMENT2 | ELEMENT3 } // error: exhaustive enum cannot add new constructor
specific enum C { ELEMENT1 | ELEMENT2 | ... } // error: exhaustive 'common' enum cannot be matched with non-exhaustive 'specific' enum
specific enum D { ELEMENT1 | ELEMENT2 | ELEMENT3 } // ok
specific enum E { ELEMENT1 | ELEMENT2 | ELEMENT3 | ... } // ok
Enum Member Functions
common enums and specific enums must adhere to the following restrictions for member functions:
commonmember functions may have concrete implementations or only retain function signatures, with implementations provided byspecificmember functions.- If a
commonmember function has a complete implementation, the correspondingspecificmember function can be omitted; otherwise, a matchingspecificmember function must exist. - The parameters, return types, and modifiers (excluding
common/specific) ofcommonandspecificmember functions must be identical. - Both
commonandspecificenums support regular member functions, which can be defined in eithercommonorspecificenums.
// common file
package cmp
common enum A {
| ELEMENT
common func foo1(a: Int64): Unit
common func foo2(): Unit {}
common func foo3(): Unit {}
func foo4() {}
}
// specific file
package cmp
specific enum A {
| ELEMENT
specific func foo1(a: Int64): Unit { println(a) }
specific func foo3(): Unit { println("platform") }
func foo5(): Int64 { 1 }
}
Enum Properties
common enums and specific enums must adhere to the following restrictions for properties:
commonproperties may have concrete implementations or only retain property signatures, with implementations provided byspecificproperties.- If a
commonproperty has a complete implementation, the correspondingspecificproperty can be omitted; otherwise, a matchingspecificproperty must exist. - The types, visibility, and mutability of
commonandspecificproperties must be identical. - Both
commonandspecificenums support regular properties, which can be defined in eithercommonorspecificenums.
// common file
package cmp
common enum A {
| ELEMENT
common prop a: Int64
common prop b: Int64 {
get() { 1 }
}
common prop c: Int64 {
get() { 1 }
}
prop d: Int64 {
get() { 1 }
}
}
// specific file
package cmp
specific enum A {
| ELEMENT
specific prop a: Int64 {
get() { 1 }
}
specific prop c: Int64 {
get() { 2 }
}
prop e: Int64 {
get() { 1 }
}
}
Interface
Cangjie interfaces support cross-platform features. Users can use common and specific modifiers for interfaces and their members.
// common file
package cmp
common interface A {
common func foo(): Unit
common prop p: Int64
}
// specific file
package cmp
specific interface A {
specific func foo(): Unit {}
specific prop p: Int64 {
get() { 1 }
}
}
If a common interface exists, a matching specific interface must also exist, subject to the following requirements:
- The visibility of
commonandspecificinterfaces must be identical. - The interface implementation characteristics of
commonandspecificinterfaces must be identical. - A
commonsealed interface must match aspecificsealed interface. - Direct subtypes of a sealed interface must be defined in the same
commonpackage.
Interface Member Functions
common interfaces and specific interfaces must adhere to the following restrictions for member functions:
specificmember functions can be omitted regardless of whethercommonmember functions have complete implementations.- The parameters, return types, and modifiers (excluding
common/specific) ofcommonandspecificmember functions must be identical. - If a
commonmember function includes a concrete implementation, thespecificmember function must also include a concrete implementation. - Both
commonandspecificinterfaces support regular member functions, which can be defined in eithercommonorspecificinterfaces. - New regular functions added to
specificinterfaces must include complete implementations.
// common file
package cmp
common interface A {
common func foo1(a: Int64): Unit
common func foo2(): Unit
common func foo3(): Unit {}
func foo4(): Int64
}
// specific file
package cmp
specific interface A {
specific func foo1(a: Int64): Unit { println(a) }
specific func foo3(): Unit { println("platform") }
func foo5(): Int64 { 1 }
}
Interface Properties
common interfaces and specific interfaces must adhere to the following restrictions for properties:
specificproperties can be omitted regardless of whethercommonproperties have complete implementations.- The types, visibility, and mutability of
commonandspecificproperties must be identical. - If a
commonproperty includes a concrete implementation, thespecificproperty must also include a concrete implementation. - Both
commonandspecificinterfaces support regular properties, which can exist in eithercommonorspecificinterfaces. - New properties added to
specificinterfaces must include complete implementations.
// common file
package cmp
common interface A {
common prop a: Int64
common prop b: Int64
common prop c: Int64 {
get() { 1 }
}
prop d: Int64
}
// specific file
package cmp
specific interface A {
specific prop a: Int64 {
get() { 1 }
}
specific prop c: Int64 {
get() { 2 }
}
prop e: Int64 {
get() { 1 }
}
}
extend
Cangjie’s extend supports cross-platform features, allowing users to use common and specific modifiers for extend and its members.
Note:
Generic
extenddoes not currently support this feature.
// common file
package cmp
class A{}
common extend A {
common func foo(): Unit
common prop p: Int64
}
// specific file
package cmp
specific extend A {
specific func foo(): Unit {}
specific prop p: Int64 {
get() { 1 }
}
}
If there are one or more common extend declarations, there must be a unique matching specific extend, subject to the following requirements:
- When multiple
common extenddeclarations without interfaces exist, there must be exactly onespecific extend. It is prohibited to declare private functions with the same name across multiplecommon extenddeclarations. - When a
common extendwith declared interfaces exists, thecommon extendandspecific extendmust have identical interface sets.
Member Functions of extend
Member functions in common extend and specific extend must adhere to the following constraints:
- A
commonmember function may have a concrete implementation or only a function signature, with the implementation provided by thespecificmember function. - If a
commonmember function has a complete implementation, the correspondingspecificmember function may be omitted; otherwise, a matchingspecificmember function must exist. - Parameters, return types, and modifiers (excluding
common/specific) must be identical betweencommonandspecificmember functions. - Both
common extendandspecific extendsupport regular member functions, which can be defined in either.
// common file
package cmp
class A{}
common extend A {
common func foo1(a: Int64): Unit
common func foo2(): Unit { println("common") }
func foo3(): Unit{}
}
// specific file
package cmp
specific extend A {
specific func foo1(a: Int64): Unit { println(a) }
specific func foo2(): Unit { println("platform") }
func foo4(): Int64 { 1 }
}
Properties of extend
Properties in common extend and specific extend must adhere to the following constraints:
- A
commonproperty may have a concrete implementation or only a property signature, with the implementation provided by thespecificproperty. - If a
commonproperty has a complete implementation, the correspondingspecificproperty may be omitted; otherwise, a matchingspecificproperty must exist. - Property types, visibility, and mutability must be identical between
commonandspecificproperties. - Both
common extendandspecific extendsupport regular properties, which can be defined in either.
// common file
package cmp
class A{}
common extend A {
common prop a: Int64
common prop b: Int64 {
get() { 1 }
}
prop c: Int64{
get() { 1 }
}
}
// specific file
package cmp
specific extend A {
specific prop a: Int64 {
get() { 1 }
}
specific prop b: Int64 {
get() { 2 }
}
prop d: Int64 {
get() { 1 }
}
}
Cross-Platform Compilation
Users can compile cross-platform packages using cjc.
Note:
Import statements in the platform-specific code of a cross-platform package must be a superset of those in the common code; otherwise, compilation errors may occur.
Compilation with cjc
Given the following directory structure:
cjmp_project(package cjmp)
├── common
│ └── common.cj
├── platform
│ └── platform.cj
└── main.cj
-
First, compile the file containing the common code.
cjc --experimental common/common.cj --output-type=chir --output-dir ./common -
Next, compile the file containing the platform-specific code.
cjc --experimental platform/platform.cj common/common.chir --common-part-cjo=./common/cjmp.cjo --output-type=dylib --output-dir ./platform -
When invoking code for different platforms, specify the platform by referencing the
.sofile generated from compiling the platform-specific file.cjc main.cj -o main --import-path=./platform -L./platform -lcjmp
Cross-Platform Development Example
Using the Platform() Interface to Retrieve the Platform Name
Common definition file.
// common.cj
package example.cmp
// Retrieve platform information
public common func Platform(): String
Linux platform file.
// linux.cj
package example.cmp
public specific func Platform(): String {
"Linux"
}
Windows platform file.
// windows.cj
package example.cmp
public specific func Platform(): String {
"Win64"
}
macOS platform file.
// macos.cj
package example.cmp
public specific func Platform(): String {
"Mac"
}
Application-side code.
// app.cj
import example.cmp.Platform
main() {
println("${Platform()}")
}
cjc Usage
cjc is the compilation command for the Cangjie programming language, offering extensive functionality and corresponding compilation options. This chapter introduces its basic usage.
cjc-frontend (Cangjie Frontend Compiler) is provided alongside cjc through the Cangjie SDK. cjc-frontend can compile Cangjie source code into Cangjie’s intermediate representation (LLVM IR). cjc-frontend only performs frontend compilation of Cangjie code. Although cjc-frontend and cjc share some compilation options, the compilation process terminates after frontend compilation. When using cjc, the Cangjie compiler automatically handles frontend and backend compilation as well as linking. cjc-frontend is provided solely as the physical embodiment of the frontend compiler. Except for compiler developers, cjc should be prioritized for compiling Cangjie code.
Basic Usage of cjc
This section introduces the basic usage of cjc. For details on compilation options, please refer to the cjc Compilation Options chapter.
The usage of cjc is as follows:
cjc [option] file...
Suppose there is a Cangjie file named hello.cj:
main() {
println("Hello, World!")
}
You can compile this file using the following command:
$ cjc hello.cj
An executable file named main will be generated in the working directory. By default, cjc compiles the given source code file into an executable and names it main.
The above describes cjc’s default behavior when no compilation options are provided. You can control cjc’s behavior using compilation options, such as instructing cjc to perform whole-package compilation or specifying the output file name.
Introduction to cjpm
CJPM (Cangjie Project Manager) is the official project management tool for the Cangjie language, designed to manage and maintain the module system of Cangjie projects while providing a simpler and unified compilation entry point with support for custom compilation commands. Through automated dependency management, it analyzes and merges multi-version third-party dependencies, eliminating concerns about version conflicts and significantly reducing developer workload. Additionally, it offers a native custom build mechanism for the Cangjie language, allowing developers to add pre-processing and post-processing workflows at different build stages. This enables flexible customization of the build process to meet various compilation requirements across different business scenarios.
Basic Usage of cjpm
Run cjpm -h to view the main interface, which consists of several sections: command description, usage examples (Usage), available subcommands (Available subcommands), supported options (Available options), and additional tips.
Cangjie Project Manager
Usage:
cjpm [subcommand] [option]
Available subcommands:
init Init a new cangjie module
check Check the dependencies
update Update cjpm.lock
tree Display the package dependencies in the source code
build Compile the current module
run Compile and run an executable product
test Unittest a local package or module
bench Run benchmarks in a local package or module
clean Clean up the target directory
install Install a cangjie binary
uninstall Uninstall a cangjie binary
Available options:
-h, --help help for cjpm
-v, --version version for cjpm
Use "cjpm [subcommand] --help" for more information about a command.
cjpm init initializes a new Cangjie module or workspace. By default, it creates a cjpm.toml file in the current directory, along with a src folder containing a default main.cj file. Custom initialization parameters can be viewed via cjpm init -h.
Example:
Input: cjpm init
Output: cjpm init success
cjpm build compiles the current Cangjie project. Before execution, it checks dependencies and then invokes cjc for compilation. It supports full compilation, incremental compilation, cross-compilation, parallel compilation, and more. Additional compilation features can be viewed via cjpm build -h. The cjpm build -V command prints all compilation process commands.
Example:
Input: cjpm build -V
Output:
compile package module1.package1: cjc --import-path target -p "src/package1" --output-type=staticlib -o target/release/module1/libmodule1.package1.a
compile package module1: cjc --import-path target -L target/release/module1 -lmodule1.package1 -p "src" --output-type=exe --output-dir target/release/bin -o main
cjpm build success
cjpm.toml Configuration File Explanation
The cjpm.toml configuration file defines basic information, dependencies, and compilation options. cjpm primarily parses and executes based on this file.
Example configuration:
[package] # Single-module configuration field; cannot coexist with [workspace]
cjc-version = "1.0.0" # Minimum required `cjc` version (mandatory)
name = "demo" # Module name and root package name (mandatory)
description = "nothing here" # Description (optional)
version = "1.0.0" # Module version (mandatory)
compile-option = "" # Additional compilation options (optional)
override-compile-option = "" # Additional global compilation options (optional)
link-option = "" # Linker passthrough options (optional)
output-type = "executable" # Compilation output type (mandatory)
src-dir = "" # Source code directory (optional)
target-dir = "" # Output directory (optional)
package-configuration = {} # Single-package configuration (optional)
[workspace] # Workspace management field; cannot coexist with [package]
members = [] # Workspace member modules (mandatory)
build-members = [] # Workspace compilation modules (subset of members, optional)
test-members = [] # Workspace test modules (subset of build-members, optional)
compile-option = "" # Additional compilation options for all workspace members (optional)
override-compile-option = "" # Additional global compilation options for all workspace members (optional)
link-option = "" # Linker passthrough options for all workspace members (optional)
target-dir = "" # Output directory (optional)
[dependencies] # Source code dependencies (optional)
coo = { git = "xxx", branch = "dev" } # Git dependency
doo = { path = "./pro1" } # Local source dependency
[test-dependencies] # Test-phase dependencies (format same as [dependencies], optional)
[script-dependencies] # Build script dependencies (format same as [dependencies], optional)
[replace] # Dependency replacement (format same as [dependencies], optional)
[ffi.c] # C library dependencies (optional)
clib1.path = "xxx"
[profile] # Command profile configuration (optional)
build = {} # Build command configuration
test = {} # Test command configuration
bench = {} # Benchmark command configuration
customized-option = {} # Custom passthrough options
[target.x86_64-unknown-linux-gnu] # Backend and platform-specific configuration (optional)
compile-option = "value1" # Additional compilation options for specific targets or cross-compilation (optional)
override-compile-option = "value2" # Additional global compilation options for specific targets or cross-compilation (optional)
link-option = "value3" # Linker passthrough options for specific targets or cross-compilation (optional)
[target.x86_64-w64-mingw32.dependencies] # Dependencies for specific targets (optional)
[target.x86_64-w64-mingw32.test-dependencies] # Test-phase dependencies for specific targets (optional)
[target.x86_64-unknown-linux-gnu.bin-dependencies] # Cangjie binary library dependencies for specific targets or cross-compilation (optional)
path-option = ["./test/pro0", "./test/pro1"] # Binary library dependencies via directory paths
[target.x86_64-unknown-linux-gnu.bin-dependencies.package-option] # Binary library dependencies via single files
"pro0.xoo" = "./test/pro0/pro0.xoo.cjo"
"pro0.yoo" = "./test/pro0/pro0.yoo.cjo"
"pro1.zoo" = "./test/pro1/pro1.zoo.cjo"
Conditional Compilation
Developers can achieve conditional compilation through predefined or custom conditions. Currently, Cangjie supports conditional compilation for imports and declarations.
Conditional Compilation for Imports and Declarations
Cangjie supports using the built-in compilation marker @When for conditional compilation. Compilation conditions are enclosed in [], which can contain one or multiple sets of conditions. @When can be applied to import nodes and declaration nodes (except package).
Usage Example
Taking the built-in os compilation condition as an example, its usage is as follows:
@When[os == "Linux"]
class mc{}
main(): Int64 {
var a = mc()
return 0
}
In the above code, developers can successfully compile and execute it in Linux systems; in non-Linux systems, they will encounter a compilation error indicating that the mc class definition cannot be found.
Important notes:
-
Cangjie does not support nested conditional compilation. The following syntax is prohibited:
@When[os == "Windows"] @When[os == "Linux"] // Error, illegal nested when conditional compilation import std.ast.* @When[os == "Windows"] @When[os == "Linux"] // Error, illegal nested when conditional compilation func A(){} -
@When[...]is a built-in compilation marker processed before imports. If code generated by macro expansion contains@When[...], it will result in a compilation error, such as:@Derive[ToString] @When[os == "Linux"] // Error, unexpected when conditional compilation directive class A {}
Built-in Conditional Variables
Cangjie provides the following built-in conditional variables: os, arch, env, backend, cjc_version, debug and test.
os
os represents the target platform’s operating system. It supports == and != operators. Supported operating systems include: Windows, Linux, macOS, iOS.
Usage example:
@When[os == "Linux"]
func foo() {
print("Linux, ")
}
@When[os == "Windows"]
func foo() {
print("Windows, ")
}
@When[os != "Windows"]
func fee() {
println("NOT Windows")
}
@When[os != "Linux"]
func fee() {
println("NOT Linux")
}
main() {
foo()
fee()
}
When compiled and executed in a Windows environment, it will output Windows, NOT Linux; in a Linux environment, it will output Linux, NOT Windows.
arch
arch represents the target platform’s processor architecture. It supports == and != operators.
Supported architectures: x86_64, aarch64, arm.
Usage example:
@When[arch == "aarch64"]
var arch = "aarch64"
@When[arch == "x86_64"]
var arch = "x86_64"
@When[arch == "arm"]
var arch = "arm"
main() {
println(arch)
}
When compiled and executed on an x86_64 architecture platform, it will output x86_64; on an aarch64 architecture platform, it will output aarch64; on an arm architecture platform, it will output arm.
env
env provides additional information, such as the ABI (Application Binary Interface) of the target platform, to eliminate ambiguities between different target platforms. It supports == and != operators.
Supported environment: ohos,gnu,simulator,android and default(empty string).
Usage example:
@When[env == "ohos"]
var env = "ohos"
@When[env != "ohos"]
var env = "other"
main() {
println(env)
}
When compiled and executed on the OpenHarmony target platform, you will get information ohos; when compile on other target platform, you will get information other
backend
backend represents the target platform’s backend type, supporting conditional compilation for multiple backends. It supports == and != operators.
Currently supported backends: cjnative.
Usage example:
@When[backend == "cjnative"]
func foo() {
print("cjnative backend")
}
@When[backend != "cjnative"]
func foo() {
print("not cjnative backend")
}
main() {
foo()
}
When compiled and executed with the cjnative backend package, it will output cjnative backend.
cjc_version
cjc_version is a built-in condition that allows developers to select code based on the current Cangjie compiler version. It supports ==, !=, >, <, >=, and <= operators. The format is xx.xx.xx, where each xx supports 1-2 digits. The comparison rule pads each part to 2 digits, e.g., 0.18.8 < 0.18.11, 0.18.8 == 0.18.08.
Usage example:
@When[cjc_version == "0.18.6"]
func foo() {
println("cjc_version equals 0.18.6")
}
@When[cjc_version != "0.18.6"]
func foo() {
println("cjc_version is NOT equal to 0.18.6")
}
@When[cjc_version > "0.18.6"]
func fnn() {
println("cjc_version is greater than 0.18.6")
}
@When[cjc_version <= "0.18.6"]
func fnn() {
println("cjc_version is less than or equal to 0.18.6")
}
@When[cjc_version < "0.18.6"]
func fee() {
println("cjc_version is less than 0.18.6")
}
@When[cjc_version >= "0.18.6"]
func fee() {
println("cjc_version is greater than or equal to 0.18.6")
}
main() {
foo()
fnn()
fee()
}
The output of the above code will vary depending on the cjc version.
debug
debug indicates whether debug mode is enabled (i.e., the -g compilation option is used). It can be used to switch between debug and release builds. It only supports the logical NOT operator (!).
Usage example:
@When[debug]
func foo() {
println("debug")
}
@When[!debug]
func foo() {
println("NOT debug")
}
main() {
foo()
}
When compiled with -g, it will output debug; without -g, it will output NOT debug.
test
test indicates whether the unit test option --test is enabled. It only supports the logical NOT operator (!). It can be used to distinguish test code from regular code.
Usage example:
@When[test]
@Test
class Tests {
@TestCase
public func case1(): Unit {
@Expect("run", foo())
}
}
func foo() {
"run"
}
@When[!test]
main () {
println(foo())
}
When compiled with --test, it will produce test results; without --test, it will compile and run normally, outputting run.
Custom Conditional Variables
Cangjie allows developers to define custom conditional variables and values. Custom variable names must be valid identifiers and cannot conflict with built-in variables. Their values are string literals. Custom conditions support == and != operators. Unlike built-in variables, custom conditions must be defined via the --cfg compilation option or in the cfg.toml configuration file.
Configuring Custom Conditional Variables
There are two ways to configure custom conditional variables: directly via compilation options or through a configuration file.
Developers can use --cfg <value> to pass custom compilation conditions as key-value pairs or specify the search path for the cfg.toml configuration file.
-
Option values must be enclosed in double quotes.
-
If the option value contains
=, it will be treated as a key-value pair (if the path contains=, it must be escaped with\). Multiple key-value pairs can be separated by commas,. For example:$ cjc --cfg "feature = lion, platform = dsp" source.cj -
Multiple
--cfgoptions can be used, e.g.:$ cjc --cfg "feature = lion" --cfg "platform = dsp" source.cj -
Defining the same variable multiple times is prohibited, e.g.:
$ cjc --cfg "feature = lion" --cfg "feature = meta" source.cj$ cjc --cfg "feature = lion, feature = meta" source.cjBoth commands will result in errors.
-
If the option value does not contain
=or contains an escaped=, it will be treated as the search path forcfg.toml. For example:$ cjc --cfg "./cfg" source.cjIf
./cfg/cfg.tomlexists, the compiler will automatically read the custom conditions defined in it. Thecfg.tomlfile should contain key-value pairs, with each pair on a separate line. Keys must be valid Cangjie identifiers, and values must be double-quoted strings (no escape sequences). Full-line and inline comments are supported, e.g.:feature = "lion" platform = "dsp" # Full-line comment feature = "meta" # Inline comment -
When multiple
--cfgoptions specifycfg.tomlsearch paths, they are searched in the order provided. If nocfg.tomlis found in any path, the compiler will search forcfg.tomlin the default path. -
If any
--cfgoption directly provides key-value pairs, thecfg.tomlconfiguration will be ignored. -
If no
--cfgoption is used, the compiler will search forcfg.tomlin the default path (thepackagedirectory specified by--packageor-p, or thecjcexecution directory).
Multi-Conditional Compilation
Cangjie allows developers to combine multiple conditional compilation options freely. Logical operators and parentheses can be used to specify precedence.
Usage example 1:
//source.cj
@When[(test || feature == "lion") && !debug]
func fee() {
println("feature lion")
}
main() {
fee()
Compile and run the above code using the following command:
$ cjc --cfg="feature=lion" source.cj -o runner.out
The output will be as follows:
feature lion
Usage example 2:
Cangjie cross-compiled to the target platform aarch64-linux-android31. The conditional variables are set as shown in the code below. If you need to cross-compile to other platforms, please refer to the Target platform and Conditional Compilation Mapping Table .
@When[os == "Linux" && arch == "aarch64" && env == "android"]
func foo() {
"target aarch64-linux-android31 run"
}
main() {
println(foo())
}
Appendix
Target platform and Conditional Compilation Mapping Table
The target platforms supported by Cangjie’s cross-compilation are determined by the build-in conditional variables os, arch and env. The mapping between this three variables and the target platforms is shown in the table below:
| target platform | arch | os | env |
|---|---|---|---|
| x86_64-windows-gnu | “x86_64” | “Windows” | “gnu” |
| x86_64-linux-gnu | “x86_64” | “Linux” | “gnu” |
| x86_64-apple-darwin | “x86_64” | “macOS” | “” |
| x86_64-linux-ohos | “x86_64” | “Linux” | “ohos” |
| x86_64-w64-mingw32 | “x86_64” | “Windows” | “gnu” |
| x86_64-linux-android[26+][android target] | “x86_64” | “Linux” | “android” |
| aarch64-linux-gnu | “aarch64” | “Linux” | “gnu” |
| aarch64-linux-android[26+][android target] | “aarch64” | “Linux” | “android” |
| aarch64-apple-darwin | “aarch64” | “macOS” | “” |
| aarch64-linux-ohos | “aarch64” | “Linux” | “ohos” |
| arm64-apple-ios[11+][ios target] | “aarch64” | “iOS” | “” |
| arm64-apple-ios[11+]-simulator[ios target] | “aarch64” | “iOS” | “simulator” |
’[android target] x86_64-linux-android[26+], the number following the android suffix the API Level. If no number is specified, the default API Level is 26; specifying a number(e.g., x86_64-linux-android33)indicates that the Android API Level is 33, and the number of API Level must greater than or equal to 26. [ios target] arm64-apple-ios[11+], the number following the ios suffix the ios version. If no number is specified, the default ios version is 11; specifying a number(e.g., arm64-apple-ios26)indicates that the ios version is 26,and the number of ios version must greater than or equal to 11.
Cross-Compilation
Developers can cross-compile their Cangjie programs to run on different architecture platforms. Cangjie supports the following cross-compilation scenarios:
| Compilation Platform | Target Platform | Common Scenarios / Tools | Corresponding SDK Installation Package |
|---|---|---|---|
| Windows (x64) | Android (aarch64) | Android physical devices | cangjie-sdk-windows-x64-android.x.y.z.zip (or .exe) |
| Linux (x64) | Android (aarch64) | Android physical devices | cangjie-sdk-linux-x64-android.x.y.z.tar.gz |
| macOS (aarch64/x64) | Android (aarch64) | Android physical devices and Android Studio Emulator | cangjie-sdk-mac-aarch64-android.x.y.z.tar.gz |
| macOS (aarch64) | Android (arm32) | Android physical devices | cangjie-sdk-mac-aarch64-android-arm32-x.y.z.tar.gz |
| macOS (aarch64) | iOS (aarch64) | iOS physical devices | cangjie-sdk-mac-aarch64-ios.x.y.z.tar.gz |
| macOS (aarch64) | iOS Simulator (aarch64/x86_64) | Xcode Simulator | cangjie-sdk-mac-aarch64-ios.x.y.z.tar.gz |
The Cangjie programming language now supports cross-compilation to Android (aarch64 defaults to API 26+, and arm32 defaults to API 23+) and iOS, enabling developers to build applications across different platforms.
Cross-Compiling Cangjie to Android
Package Download
Developers can use Cangjie installation packages that support cross-compilation for specific platforms (android-aarch64, android-arm32). Android aarch64 and Android arm32 require different SDK installation packages.
Cangjie installation packages supporting cross-compilation to Android are listed below (the exact version number depends on the actual release package):
- Android
aarch64:cangjie-sdk-linux-x64-android.x.y.z.tar.gz,cangjie-sdk-windows-x64-android.x.y.z.zip,cangjie-sdk-windows-x64-android.x.y.z.exe,cangjie-sdk-mac-aarch64-android.x.y.z.tar.gz - Android
arm32:cangjie-sdk-mac-aarch64-android-arm32.tar.gz
For example: To cross-compile to Android aarch64 on a linux x64 platform, download and install cangjie-sdk-linux-x64-android.x.y.z.tar.gz.
In addition to the cross-compilation-supported Cangjie package, you also need Android NDK (recommended: ndk-r27d).
Compilation
Cross-compiling Cangjie to Android requires the following three dependency directories:
-
sysrootdirectory, provided byAndroid NDK, typically located at<ndk-path>/toolchains/llvm/prebuilt/<platform>/sysroot. -
Directory containing
libclang_rt.builtins-<arch>-android.a(for example,libclang_rt.builtins-aarch64-android.aorlibclang_rt.builtins-arm-android.a), provided byAndroid NDK, typically located at<ndk-path>/toolchains/llvm/prebuilt/<platform>/lib/clang/<version>/lib/linux. -
Toolchain binary directory, provided by
Android NDK, typically located at<ndk-path>/toolchains/llvm/prebuilt/<platform>/bin.
When using cjc for cross-compilation, the following additional options must be specified (replace < > parts with actual directories):
--target=aarch64-linux-androiddefaults to Android API 26 for cross-compilation; append the API level suffix explicitly for a higher version, e.g.--target=aarch64-linux-android31for Android API 31.--target=arm-linux-android23specifies thearm32target platform (Android API 23).--sysroot=<sysroot-path>specifies the toolchain’s root directory path<sysroot-path>-L<lib-path>specifies the directory<lib-path>containinglibclang_rt.builtins-<arch>-android.a-B<toolchain-bin-path>specifies theAndroid NDKtoolchain binary directory<toolchain-bin-path>
For cross-compiling to aarch64, run:
$ cjc main.cj --target=aarch64-linux-android31 \
--sysroot /opt/buildtools/android_ndk-r25b/toolchains/llvm/prebuilt/linux-x86_64/sysroot \
-L /opt/buildtools/android_ndk-r25b/toolchains/llvm/prebuilt/linux-x86_64/lib/clang/14.0.6/lib/linux
main.cjis the Cangjie code being cross-compiled,aarch64-linux-android31is the target platform/opt/buildtools/android_ndk-r25b/toolchains/llvm/prebuilt/linux-x86_64/sysrootis the toolchain root directory<sysroot-path>/opt/buildtools/android_ndk-r25b/toolchains/llvm/prebuilt/linux-x86_64/lib/clang/14.0.6/lib/linuxis the directory<lib-path>containinglibclang_rt.builtins-aarch64-android.a
For cross-compiling to arm32, run:
$ cjc test.cj --target=arm-linux-android23 \
--sysroot /home/rus/opt/android-ndk-r27d/toolchains/llvm/prebuilt/linux-x86_64/sysroot \
-L /home/rus/opt/android-ndk-r27d/toolchains/llvm/prebuilt/linux-x86_64/lib/clang/18/lib/linux/ \
-B/home/rus/opt/android-ndk-r27d/toolchains/llvm/prebuilt/linux-x86_64/bin
test.cjis the Cangjie code being cross-compiled,arm-linux-android23is the target platform/home/rus/opt/android-ndk-r27d/toolchains/llvm/prebuilt/linux-x86_64/sysrootis the toolchain root directory<sysroot-path>/home/rus/opt/android-ndk-r27d/toolchains/llvm/prebuilt/linux-x86_64/lib/clang/18/lib/linux/is the directory<lib-path>containinglibclang_rt.builtins-arm-android.a/home/rus/opt/android-ndk-r27d/toolchains/llvm/prebuilt/linux-x86_64/binis the toolchain binary directory<toolchain-bin-path>
Notes:
ARM32 does not support reflection-related features, stack expansion, runtime tracing, and other related functionalities.
Deployment and Execution
After compilation, the following files need to be pushed to the Android device:
- The executable and all its dependent dynamic libraries: e.g.,
mainand its dependent.sofiles - Cangjie runtime dependencies: choose the corresponding directory based on the target platform (for example,
aarch64-linux-android31uses$CANGJIE_HOME/runtime/lib/linux_android31_aarch64_cjnative/*.so, whilearm-linux-android23uses$CANGJIE_HOME/runtime/lib/linux_android23_arm_cjnative/*.so; actual directory names depend on the SDK)
Use the Android Debug Bridge adb tool to push the executable and Cangjie libraries to the device. Example:
aarch64 target example:
$ adb push ./main /data/local/tmp/
$ adb push $CANGJIE_HOME/runtime/lib/linux_android31_aarch64_cjnative/* /data/local/tmp/
arm32 target example:
$ adb push ./main /data/local/tmp/
$ adb push $CANGJIE_HOME/runtime/lib/linux_android23_arm_cjnative/* /data/local/tmp/
For detailed usage of the adb tool, please refer to the Android Debug Bridge (adb) documentation on the official Android website.
.so dynamic library files can be deployed directly to system directories. If deployed in non-standard directories, add the directory to the LD_LIBRARY_PATH environment variable before execution.
To run the Cangjie program main:
$ adb shell "chmod +x /data/local/tmp/main"
$ adb shell "LD_LIBRARY_PATH=/data/local/tmp /data/local/tmp/main"
Cross-Compiling Cangjie to iOS
Package Download
The Cangjie SDK that supports cross-compiling for iOS is provided as cangjie-sdk-mac-aarch64-ios.x.y.z.tar.gz .
To cross-compile to iOS on a macOS platform, download and install the package that matches your host architecture . The Cangjie runtime and standard libraries natively support iOS 11 and above (for exceptions, refer to the “Cangjie Programming Language Library API” manual).
In addition to the cross-compilation supported Cangjie package, you also need to download Xcode. After installation, install the iOS development components in Xcode. Refer to the “Downloading and installing additional Xcode components” section in the Xcode manual for specific steps.
Compilation
Currently, Cangjie cross-compilation to iOS only supports compiling static libraries. When cross-compiling Cangjie code to iOS devices, specify the following additional options:
--target=aarch64-apple-ios<version>specifies the target platformiosfor cross-compilation--output-type=staticlibspecifies the output file type as a static library
Notes:
The version number specified by <version> is recommended to align with the SDK version supported by the installed Xcode (e.g., 17.5).
Currently, Cangjie supports cross-compiling to the x86_64 architecture iOS simulator from an aarch64 architecture environment (the compiled product for this architecture requires Rosetta in Xcode to run).For running on iOS simulators, specify:
--target=aarch64-apple-ios-simulatoror--target=x86_64-apple-ios-simulatorspecifies the target platformios-simulatorfor cross-compilation--output-type=staticlibspecifies the output file type as a static library
The compilation output must be added to an Xcode project and built through Xcode to create an iOS application.
Example command to compile main.cj into libmain.a static library:
cjc main.cj --output-type=staticlib --target=aarch64-apple-ios17.5 -o libmain.a
main.cj is the Cangjie code being cross-compiled, aarch64-apple-ios17.5 is the target platform, and --output-type=staticlib specifies the output file type as a static library.
In addition to adding the Cangjie compilation output to the Xcode project, the following configurations are required when building with Xcode:
-
Select a directory based on the runtime target (device or simulator):
-
For devices:
$CANGJIE_HOME/lib/ios_aarch64_cjnative -
For simulators:
$CANGJIE_HOME/lib/ios_simulator_aarch64_cjnative
Add all
.afiles from the corresponding directory to theXcodeproject. -
-
Configure the
Build Settings > Other Linker Flagsfield in theXcodeproject with the following values:-
$CANGJIE_HOME/lib/ios_aarch64_cjnative/section.o -
$CANGJIE_HOME/lib/ios_aarch64_cjnative/cjstart.o -
-lc++
Note: The linking options must be added in the exact order listed above. Replace
$CANGJIE_HOMEwith the actual Cangjie installation directory. For simulator targets, replaceios_aarch64_cjnativewithios_simulator_aarch64_cjnative. -
-
Set the
Build Settings > Dead Code Strippingfield in theXcodeproject toNo.
After configuration, build the project directly through Xcode.
Deployment and Execution
Build and deploy to real devices or simulators through Xcode. Refer to the “Build and running an app” section in the Xcode manual for specific steps.
Deploying the Cangjie Runtime
To ensure the Cangjie executable can run properly across different operating system environments, the Cangjie language provides a runtime environment. This runtime environment grants Cangjie executables access to memory and other system resources, such as the dynamic libraries required during execution.
Installing the complete Cangjie toolchain includes both the Cangjie code compilation environment and the runtime installation (see the Installing the Cangjie Toolchain section for details). If code compilation is not required and only the execution of binaries is needed, the runtime can be deployed independently in the environment.
This section describes the deployment of the Cangjie runtime.
Important Note: When compiling with fully static linking of Cangjie libraries, the runtime modules are already embedded in the executable during compilation. Therefore, no additional runtime deployment is needed in the execution environment, and the compiled executable can be run directly.
Linux
-
First, visit the official Cangjie distribution channels to download the installation package compatible with your platform architecture:
cangjie-sdk-linux-x64-x.y.z.tar.gz: For x86_64 architecture Linux systems.cangjie-sdk-linux-aarch64-x.y.z.tar.gz: For aarch64 architecture Linux systems.
-
Extract the downloaded package to an appropriate directory.
After extraction, you will find a directory named
cangjiein your current working path, containing all components of the Cangjie toolchain.The
runtimedirectory undercangjiecontains all dynamic libraries for the Cangjie runtime. -
Execute the following command in the runtime environment to complete the runtime deployment (replace
${CANGJIE_HOME}with the path to thecangjiedirectory and${hw_arch}with the corresponding hardware architecture):export LD_LIBRARY_PATH=${CANGJIE_HOME}/runtime/lib/linux_${hw_arch}_cjnative:${LD_LIBRARY_PATH}
macOS
-
First, visit the official Cangjie distribution channels to download the installation package compatible with your platform architecture:
cangjie-sdk-mac-x64-x.y.z.tar.gz: For x86_64 architecture macOS systems.cangjie-sdk-mac-aarch64-x.y.z.tar.gz: For aarch64/arm64 architecture macOS systems.
-
Extract the downloaded package to an appropriate directory.
After extraction, you will find a directory named
cangjiein your current working path, containing all components of the Cangjie toolchain.The
runtimedirectory undercangjiecontains all dynamic libraries for the Cangjie runtime. -
Execute the following command in the runtime environment to complete the runtime deployment (replace
${CANGJIE_HOME}with the path to thecangjiedirectory and${hw_arch}with the corresponding hardware architecture):export DYLD_LIBRARY_PATH=${CANGJIE_HOME}/runtime/lib/darwin_${hw_arch}_cjnative:${DYLD_LIBRARY_PATH}
Windows
-
First, visit the official Cangjie distribution channels to download the installation package compatible with your platform architecture:
cangjie-sdk-windows-x64-x.y.z.zip: For x86_64 architecture Windows systems.
-
Extract the downloaded package to an appropriate directory.
After extraction, you will find a directory named
cangjiein your current working path, containing all components of the Cangjie toolchain.The
runtimedirectory undercangjiecontains all dynamic libraries for the Cangjie runtime. -
Developers can choose one of the following methods to deploy the runtime based on their environment and preferences (replace
${CANGJIE_HOME}with the path to thecangjiedirectory and${hw_arch}with the corresponding hardware architecture):-
For Windows Command Prompt (CMD) environments, execute:
set "PATH=${CANGJIE_HOME}\runtime\lib\windows_x86_64_cjnative;%PATH%;" -
For PowerShell environments, execute:
$env:PATH = "${CANGJIE_HOME}\runtime\lib\windows_x86_64_cjnative;" + $env:Path -
For MSYS shell, bash, or similar environments, execute:
export PATH=${CANGJIE_HOME}/runtime/lib/windows_x86_64_cjnative
-
Running the Cangjie Executable
Direct Execution
Linux / macOS
-
First, please refer to the Deploying Cangjie Runtime section to complete the runtime library deployment.
-
Copy the compiled executable file
mainto the target environment and execute it../mainNote: The executable
maincompiled usingcjpmis located in thetarget/release/bindirectory.
Windows
-
First, please refer to the Deploying Cangjie Runtime section to complete the runtime library deployment.
-
Copy the compiled executable file
main.exeto the target environment and execute it..\main.exeNote: The executable
main.execompiled usingcjpmis located in thetarget\release\bindirectory.
Using cjpm to Run
Developers commonly use cjpm to manage, compile, and run Cangjie projects.
Developers can install the complete Cangjie toolchain on the target environment by following the Installing Cangjie Toolchain section. After installation, copy the entire Cangjie project to the target environment and use the cjpm run command to execute the project.
cjc Compilation Options
This chapter introduces commonly used cjc compilation options. If an option is also applicable to cjc-frontend, it will be marked with a [frontend] superscript; if the behavior differs between cjc and cjc-frontend, additional explanations will be provided.
-
Options starting with two hyphens are long options, such as
--xxxx. If a long option has an optional parameter, the option and parameter must be connected with an equals sign, e.g.,--xxxx=<value>. If a long option has a mandatory parameter, the option and parameter can be separated by either a space or an equals sign, e.g.,--xxxx <value>is equivalent to--xxxx=<value>. -
Options starting with one hyphen are short options, such as
-x. For short options, if they are followed by a parameter, the option and parameter can be separated by a space or not, e.g.,-x <value>is equivalent to-x<value>.
Basic Options
--output-type=[exe|staticlib|dylib|obj] [frontend]
Specifies the type of the output file. In exe mode, an executable file is generated; in staticlib mode, a static library file (.a file) is generated; in dylib mode, a dynamic library file is generated (.so on Linux, .dll on Windows, and .dylib on macOS); in obj mode , an intermediate object file is generated ( .o on Linux and macOS, obj on Windows).
Note:
Obj modes are experimental features, and using the option
--output-type=[obj]may entail potential risks. This option must be used in conjunction with the--experimentaloption. In particular, the obj mode needs to be paired with the--compile-targetoption described below (see the--compile-targetsection for detailed usage).
cjc defaults to exe mode.
In addition to compiling .cj files into an executable, they can also be compiled into static or dynamic libraries. For example:
$ cjc tool.cj --output-type=dylib
This compiles tool.cj into a dynamic library. On Linux, cjc generates a dynamic library file named libtool.so.
Note: If an executable program links to a Cangjie dynamic library file, --dy-std option must also be specified. For details, refer to the --dy-std option description.
[frontend] In cjc-frontend, the compilation process stops at LLVM IR, so the output is always a .bc file. However, different --output-type values still affect the frontend compilation strategy.
--compile-target=[exe|staticlib|dylib] [frontend]
This option is exclusively applicable to the --output-type=obj mode, with a default value of exe. Since the generated .obj/.o files are compilation intermediate products, specifying the --compile-target option enables the compiler to adopt the corresponding compilation strategy, thus generating intermediate files tailored for different types of final products. The compiler can directly take these .obj/.o files as input for subsequent linking processes.
NOTE:
This is an experimental feature with potential risks, and it must be used in conjunction with the
--experimentaloption.
For example, the following commands implement step-by-step compilation and linking to generate an executable file:
// main.cj
main(){
println("hello cangjie")
}
# Specify --output-type as obj and explicitly set --compile-target to exe
cjc main.cj --output-type=obj --experimental -o main.o --compile-target=exe
# Link the intermediate product into an executable file
cjc main.o -lcangjie-std-core -o main --experimental
In Step 2, the parameter -lcangjie-std-core is used to specify the standard library dependencies required during the compilation process. In manual linking scenarios, the naming of dependent libraries must strictly conform to the prescribed naming convention(e.g., -lcangjie-std-math, -lcangjie-std-collection.concurrent, etc.). Failure to comply with this naming convention will result in undefined symbol errors.
Critical Notes:
- This option does not support scenarios where
a .ofile is used as the input while the--output-type=objoption is specified again.Invalid usage example:cjc main.o --output-type=obj --compile-target=exe. - When
--output-typeis set to a non-obj type, the--compile-targetoption will not take effect (and will be ignored).Invalid usage example:cjc main.cj --output-type=exe --compile-target=dylib. - If the input file is solely
a .ointermediate target file, the--output-typeconfigured in the current step (with a default value of exe) must be logically consistent with the--compile-targetspecified when generating the.ofile. For instance, ifa.ofile is compiled with--compile-target=dylib(intended for dynamic library generation), then--output-typeshould also be set to dylib during the linking phase. Otherwise, linking failures or mismatches in product types may occur.
--package, -p [frontend]
Compiles a package. When using this option, a directory must be specified as input, and the source files in the directory must belong to the same package.
For example, given the file log/printer.cj:
package log
public func printLog(message: String) {
println("[Log]: ${message}")
}
And the file main.cj:
import log.*
main() {
printLog("Everything is great")
}
You can compile the log package using:
$ cjc -p log --output-type=staticlib
cjc will generate a liblog.a file in the current directory.
Then, you can use the liblog.a file to compile main.cj with the following command:
$ cjc main.cj liblog.a
cjc will compile main.cj and liblog.a together into an executable named main.
--module-name <value> [frontend]
Note:
This option is deprecated and will be removed in the future. Using this option in the current version has no functional effect.
--output <value>, -o <value>, -o<value> [frontend]
Specifies the output file path. The compiler’s output will be written to the specified file.
For example, the following command specifies the output executable name as a.out:
cjc main.cj -o a.out
--library <value>, -l <value>, -l<value>
Specifies the library file to link.
The given library file will be passed directly to the linker. This option is typically used in conjunction with --library-path <value>.
The filename format should be lib[arg].[extension]. When linking library a, you can use the option -l a. The linker will search for files like liba.a, liba.so (or liba.dll on Windows) in the library search directories and link them as needed.
--library-path <value>, -L <value>, -L<value>
Specifies the directory containing the library files to link.
When using --library <value>, this option is typically also needed to specify the directory containing the library files.
The path specified by --library-path <value> will be added to the linker’s library search path. Additionally, paths specified in the LIBRARY_PATH environment variable will also be added, but paths specified via --library-path take precedence over those in LIBRARY_PATH.
For example, given a dynamic library file libcProg.so compiled from the following C source:
#include <stdio.h>
void printHello() {
printf("Hello World\n");
}
The Cangjie file main.cj:
foreign func printHello(): Unit
main(): Int64 {
unsafe {
printHello()
}
return 0
}
You can compile main.cj and link the cProg library using:
cjc main.cj -L . -l cProg
cjc will output an executable named main. Running main will produce:
$ LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH ./main
Hello World
Note: Since a dynamic library is used, the library directory must be added to $LD_LIBRARY_PATH to ensure dynamic linking at runtime.
-g [frontend]
Generates an executable or library file with debug information.
Note:
-gcan only be used with-O0. Higher optimization levels may cause debugging features to malfunction.
--trimpath <value> [frontend]
Removes the specified prefix from source file paths in debug information.
When compiling Cangjie code, cjc saves absolute paths of source files (.cj files) for debugging and exception handling. This option removes the specified prefix from these paths.
Multiple --trimpath options can be used to specify different prefixes. For each source file path, the compiler removes the first matching prefix.
--coverage [frontend]
Generates an executable that supports code coverage statistics. The compiler generates a .gcno file for each compilation unit. After execution, each unit produces a .gcda file. Using these files with the cjcov tool generates a coverage report.
Note:
--coveragecan only be used with-O0. Higher optimization levels trigger a warning, and-O0is enforced. This option is for executables; using it for static or dynamic libraries may cause linking errors.
--int-overflow=[throwing|wrapping|saturating] [frontend]
Specifies the overflow strategy for fixed-precision integer operations. Defaults to throwing.
throwing: Throws an exception on overflow.wrapping: Wraps around to the other end of the integer range.saturating: Clamps to the minimum or maximum value of the integer type.
--diagnostic-format=[default|noColor|json] [frontend]
Note:
Windows versions currently do not support colored error messages.
Specifies the format for error messages. Defaults to default.
default: Standard format with colors.noColor: Standard format without colors.json: JSON format.
--verbose, -V [frontend]
Prints compiler version information, toolchain dependencies, and commands executed during compilation.
--help, -h [frontend]
Prints available compilation options.
When this option is used, the compiler only prints option information and does not compile any input files.
--version, -v [frontend]
Prints compiler version information.
When this option is used, the compiler only prints version information and does not compile any input files.
--save-temps <value>
Retains intermediate files generated during compilation and saves them to the specified path.
The compiler retains intermediate files like .bc and .o.
--import-path <value> [frontend]
Specifies the search path for imported module AST files.
For example, given the following directory structure where libs/myModule contains the myModule library files and AST export files for the log package:
.
├── libs
| └── myModule
| ├── myModule.log.cjo
| └── libmyModule.log.a
└── main.cj
And the main.cj file:
import myModule.log.printLog
main() {
printLog("Everything is great")
}
You can add ./libs to the AST file search path using --import-path ./libs. cjc will use ./libs/myModule/myModule.log.cjo for semantic checking and compilation of main.cj.
--import-path provides the same functionality as the CANGJIE_PATH environment variable, but paths specified via --import-path take precedence.
--scan-dependency [frontend]
The --scan-dependency command outputs direct dependencies and other information for a package’s source code or .cjo file in JSON format.
// this file is placed under directory pkgA
macro package pkgA
import pkgB.*
import std.io.*
import pkgB.subB.*
cjc --scan-dependency --package pkgA
Or:
cjc --scan-dependency pkgA.cjo
{
"package": "pkgA",
"isMacro": true,
"dependencies": [
{
"package": "pkgB",
"isStd": false,
"imports": [
{
"file": "pkgA/pkgA.cj",
"begin": {
"line": 2,
"column": 1
},
"end": {
"line": 2,
"column": 14
}
}
]
},
{
"package": "pkgB.subB",
"isStd": false,
"imports": [
{
"file": "pkgA/pkgA.cj",
"begin": {
"line": 4,
"column": 1
},
"end": {
"line": 4,
"column": 19
}
}
]
},
{
"package": "std.io",
"isStd": true,
"imports": [
{
"file": "pkgA/pkgA.cj",
"begin": {
"line": 3,
"column": 1
},
"end": {
"line": 3,
"column": 16
}
}
]
}
]
}
--no-sub-pkg [frontend]
Indicates that the current compilation package has no sub-packages.
Enabling this option allows the compiler to further reduce code size.
--warn-off, -Woff <value> [frontend]
Disables all or specific categories of compilation warnings.
<value> can be all or a predefined warning category. When set to all, no warnings are printed; when set to a specific category, warnings of that category are suppressed.
Each warning includes a #note line indicating its category and how to disable it. Use --help to list all available warning categories.
--warn-on, -Won <value> [frontend]
Enables all or specific categories of compilation warnings.
<value> for --warn-on follows the same rules as --warn-off. This option is typically combined with --warn-off; e.g., -Woff all -Won <value> enables only warnings of the specified category.
Important: The order of --warn-on and --warn-off matters. For the same category, the latter option overrides the former. For example, -Won <value> -Woff all disables all warnings.
--error-count-limit <value> [frontend]
Limits the maximum number of errors printed by the compiler.
<value> can be all or a non-negative integer. When all, all errors are printed; when an integer N, at most N errors are printed. Defaults to 8.
--output-dir <value> [frontend]
Controls the directory for intermediate and final output files.
Specifies the directory for intermediate files like .cjo. If both --output-dir <path1> and --output <path2> are specified, intermediate files go to <path1>, and the final output goes to <path1>/<path2>.
Note:
When used with--output, the--outputparameter must be a relative path.
--static
Statically links Cangjie libraries.
This option only applies when compiling executables.
Note:
--static is effective on Linux, Windows and macOS.
--static-std
Statically links the Cangjie standard library (std module).
This option only takes effect when compiling dynamic libraries or executables.
When compiling executables (--output-type=exe), cjc defaults to statically linking the std module.
--dy-std
Dynamically link the std module of the Cangjie library.
This option only takes effect when compiling dynamic libraries or executable files.
When compiling a dynamic library (i.e., when --output-type=dylib is specified), cjc defaults to dynamically linking the std module of the Cangjie library.
Important Notes:
-
When both
--static-stdand--dy-stdoptions are used together, only the last specified option takes effect. -
When compiling an executable program that links to a Cangjie dynamic library (i.e., a product compiled with the
--output-type=dyliboption), the--dy-stdoption must be explicitly specified to dynamically link the standard library. Otherwise, multiple copies of the standard library may appear in the program, potentially causing runtime issues. -
Platform-specific support details are as follows:
Target Platform Product Supports –static-std Product Supports –dy-std Linux Supported Supported macOS Supported Supported Windows Supported Supported OpenHarmony Not supported Supported Android Supported Supported
--static-libs
Note:
This option is deprecated and will be removed in the future. Using this option in the current version has no functional effect.
--dy-libs
Note:
This option is deprecated and will be removed in the future. Using this option in the current version has no functional effect.
--stack-trace-format=[default|simple|all]
Specifies the exception stack trace printing format, which controls the display of stack frame information when an exception is thrown. The default format is default.
The stack trace formats are described as follows:
defaultformat:Function name with generic parameters omitted (filename:line number)simpleformat:filename:line numberallformat:Full function name (filename:line number)
--lto=[full|thin]
Enables and specifies the LTO (Link Time Optimization) compilation mode.
Important Notes:
- This feature is not supported on
WindowsandmacOSplatforms. - When
LTO(Link Time Optimization) is enabled and specified, the following optimization compilation options cannot be used simultaneously:-Os,-Oz.
LTO optimization supports two compilation modes:
-
--lto=full:full LTOmerges all compilation modules together and performs global optimization. This mode offers the highest optimization potential but requires longer compilation time. -
--lto=thin: Compared tofull LTO,thin LTOuses parallel optimization across multiple modules and supports incremental linking by default. It has shorter compilation time thanfull LTObut less optimization effectiveness due to reduced global information.- Typical optimization effectiveness comparison:
full LTO>thin LTO> conventional static linking compilation. - Typical compilation time comparison:
full LTO>thin LTO> conventional static linking compilation.
- Typical optimization effectiveness comparison:
LTO optimization use cases:
-
Compile an executable file using the following command:
$ cjc test.cj --lto=full or $ cjc test.cj --lto=thin -
Compile a static library (
.bcfile) required forLTOmode and use it in executable file compilation:# Generate a static library as a .bc file $ cjc pkg.cj --lto=full --output-type=staticlib -o libpkg.bc # Input the .bc file along with the source file to the Cangjie compiler for executable compilation $ cjc test.cj libpkg.bc --lto=fullNote:
In
LTOmode, the path to the static library (.bcfile) must be provided to the Cangjie compiler. -
In
LTOmode, when statically linking the standard library (--static-std), the standard library code participates inLTOoptimization and is statically linked into the executable. When dynamically linking the standard library (--dy-std), the dynamic library from the standard library is still used for linking inLTOmode.# Static linking: standard library code participates in LTO optimization $ cjc test.cj --lto=full --static-std # Dynamic linking: dynamic library is used for linking; standard library code does not participate in LTO optimization $ cjc test.cj --lto=full --dy-std
--compile-as-exe
Enabling this option will suppress the symbol visibility of bc files loaded under LTO mode, with only the package init symbol remaining visible. LLVM’s native optimization passes will then conduct aggressive dead symbol stripping based on this visibility rule. This option is valid exclusively when the –lto compilation flag is activated.
# Compiles successfully
$ cjc test.cj --lto=[full|thin] --compile-as-exe
# Compilation fails
$ cjc test.cj --compile-as-exe
--pgo-instr-gen, --pgo-instr-gen=<.profraw>
Enables instrumentation compilation, generating an executable program with instrumentation information.
- If
<.profraw>is provided, profile information will be written to the specified path file. - If
<.profraw>is not provided, profile information will be written todefault.profrawin the current directory where the Cangjie program is executed.
This feature is temporarily unsupported when compiling for macOS and Windows targets.
PGO (Profile-Guided Optimization) is a common compilation optimization technique that uses runtime profiling information to further improve program performance. Instrumentation-based PGO is a PGO optimization method that uses instrumentation information. It typically involves three steps:
- The compiler performs instrumentation compilation on the source code, generating an instrumented executable program.
- The instrumented executable program is run to generate a profile.
- The compiler uses the profile to recompile the source code.
# Generate an executable program `test` with instrumentation information for execution statistics
$ cjc test.cj --pgo-instr-gen -o test
# Run the executable program `test` to generate the `default.profraw` profile
$ ./test
# Generate an executable program `test` with instrumentation information and specific profile file path and name
$ cjc test.cj --pgo-instr-gen=./cjpgo/cj.profraw -o test
# Run the executable program `test` to generate the `./cjpgo/cj.profraw` profile
$ ./test
--pgo-instr-use=<.profdata>
Uses the specified profdata profile to guide compilation and generate an optimized executable program.
This feature is temporarily unsupported when compiling for macOS targets.
Note:
The
--pgo-instr-usecompilation option only supports profiles inprofdataformat. Thellvm-profdatatool can be used to convertprofrawprofiles toprofdataprofiles.
# Convert a `profraw` file to a `profdata` file
$ LD_LIBRARY_PATH=$CANGJIE_HOME/third_party/llvm/lib:$LD_LIBRARY_PATH $CANGJIE_HOME/third_party/llvm/bin/llvm-profdata merge default.profraw -o default.profdata
# Use the specified `default.profdata` profile to guide compilation and generate the optimized executable program `testOptimized`
$ cjc test.cj --pgo-instr-use=default.profdata -o testOptimized
--target <value> [frontend]
Specifies the target platform triple for compilation.
The <value> parameter is typically a string in the following format: <arch>(-<vendor>)-<os>(-<env>), where:
<arch>represents the system architecture of the target platform, such asaarch64,x86_64, etc.<vendor>represents the vendor of the target platform, such asapple. If the vendor is unspecified or irrelevant, it is often written asunknownor omitted.<os>represents the operating system of the target platform, such asLinux,Win32, etc.<env>represents the ABI or standard specification of the target platform, used to distinguish different runtime environments of the same OS, such asgnu,musl. If the OS does not require finer-grained distinction, this field can also be omitted.
Currently, cjc supports the following host and target platforms for cross-compilation:
| Host Platform | Target Platform |
|---|---|
| x86_64-linux-gnu | x86_64-windows-gnu |
| aarch64-linux-gnu | x86_64-windows-gnu |
| x86_64-windows-gnu | aarch64-linux-ohos |
| x86_64-windows-gnu | x86_64-linux-ohos |
| x86_64-apple-darwin | aarch64-linux-ohos |
| x86_64-apple-darwin | x86_64-linux-ohos |
| aarch64-apple-darwin | aarch64-linux-ohos |
| aarch64-apple-darwin | arm-linux-android23 |
| aarch64-apple-darwin | aarch64-apple-ios-simulator |
| aarch64-apple-darwin | x86_64-apple-ios-simulator |
| x86_64-linux-gnu | aarch64-linux-android26 |
| x86_64-apple-darwin | aarch64-linux-android26 |
| x86_64-windows-gnu | aarch64-linux-android26 |
Before using --target to specify a target platform for cross-compilation, ensure that the corresponding cross-compilation toolchain and a compatible Cangjie SDK version for the target platform are available on the host platform.
--target-cpu <value>
Note:
This option is experimental. Binaries generated with this option may have potential runtime issues. Use this option with caution. This option must be used with the
--experimentaloption.
Specifies the CPU type of the compilation target.
When specifying the CPU type, the compiler attempts to use instruction sets specific to that CPU type and applies optimizations tailored for it. Binaries generated for a specific CPU type may lose portability and might not run on other CPUs (even those with the same architecture instruction set).
This option supports the following tested CPU types:
x86-64 Architecture:
- generic
aarch64 Architecture:
- generic
- tsv110
generic is the universal CPU type. When generic is specified, the compiler generates universal instructions for the architecture. Such binaries can run on various CPUs of the same architecture (assuming the OS and dynamic dependencies are consistent), regardless of the specific CPU type. The default value for --target-cpu is generic.
This option also supports the following CPU types, but they are untested. Binaries generated for these CPU types may have runtime issues.
x86-64 Architecture:
- alderlake
- amdfam10
- athlon
- athlon-4
- athlon-fx
- athlon-mp
- athlon-tbird
- athlon-xp
- athlon64
- athlon64-sse3
- atom
- barcelona
- bdver1
- bdver2
- bdver3
- bdver4
- bonnell
- broadwell
- btver1
- btver2
- c3
- c3-2
- cannonlake
- cascadelake
- cooperlake
- core-avx-i
- core-avx2
- core2
- corei7
- corei7-avx
- geode
- goldmont
- goldmont-plus
- haswell
- i386
- i486
- i586
- i686
- icelake-client
- icelake-server
- ivybridge
- k6
- k6-2
- k6-3
- k8
- k8-sse3
- knl
- knm
- lakemont
- nehalem
- nocona
- opteron
- opteron-sse3
- penryn
- pentium
- pentium-m
- pentium-mmx
- pentium2
- pentium3
- pentium3m
- pentium4
- pentium4m
- pentiumpro
- prescott
- rocketlake
- sandybridge
- sapphirerapids
- silvermont
- skx
- skylake
- skylake-avx512
- slm
- tigerlake
- tremont
- westmere
- winchip-c6
- winchip2
- x86-64
- x86-64-v2
- x86-64-v3
- x86-64-v4
- yonah
- znver1
- znver2
- znver3
aarch64 Architecture:
- a64fx
- ampere1
- apple-a10
- apple-a11
- apple-a12
- apple-a13
- apple-a14
- apple-a7
- apple-a8
- apple-a9
- apple-latest
- apple-m1
- apple-s4
- apple-s5
- carmel
- cortex-a34
- cortex-a35
- cortex-a510
- cortex-a53
- cortex-a55
- cortex-a57
- cortex-a65
- cortex-a65ae
- cortex-a710
- cortex-a72
- cortex-a73
- cortex-a75
- cortex-a76
- cortex-a76ae
- cortex-a77
- cortex-a78
- cortex-a78c
- cortex-r82
- cortex-x1
- cortex-x1c
- cortex-x2
- cyclone
- exynos-m3
- exynos-m4
- exynos-m5
- falkor
- kryo
- neoverse-512tvb
- neoverse-e1
- neoverse-n1
- neoverse-n2
- neoverse-v1
- saphira
- thunderx
- thunderx2t99
- thunderx3t110
- thunderxt81
- thunderxt83
- thunderxt88
In addition to the above CPU types, this option supports native as the current CPU type. The compiler attempts to identify the host machine’s CPU type and uses it as the target type for binary generation.
--toolchain <value>, -B <value>, -B<value>
Specifies the path to the binary files in the compilation toolchain.
Binary files include: compilers, linkers, and target files provided by the toolchain (e.g., crt0.o, crti.o, etc.).
After preparing the compilation toolchain, you can place it in a custom path and pass this path to the compiler using --toolchain <value>, enabling the compiler to invoke the binaries in that path for cross-compilation.
--sysroot <value>
Specifies the root directory path of the compilation toolchain.
For cross-compilation toolchains with fixed directory structures, if there is no need to specify paths for binaries and libraries outside this directory, you can directly use --sysroot <value> to pass the toolchain’s root directory path to the compiler. The compiler will analyze the corresponding directory structure based on the target platform and automatically search for required binaries and libraries. Using this option eliminates the need to specify --toolchain or --library-path.
For example, when cross-compiling for a platform with the triple arch-os-env, and the cross-compilation toolchain has the following directory structure:
/usr/sdk/arch-os-env
├── bin
| ├── arch-os-env-gcc (cross-compiler)
| ├── arch-os-env-ld (linker)
| └── ...
├── lib
| ├── crt1.o (C runtime target files)
| ├── crti.o
| ├── crtn.o
| ├── libc.so (dynamic libraries)
| ├── libm.so
| └── ...
└── ...
Given a Cangjie source file hello.cj, you can use the following command to cross-compile hello.cj for the arch-os-env platform:
cjc --target=arch-os-env --toolchain /usr/sdk/arch-os-env/bin --toolchain /usr/sdk/arch-os-env/lib --library-path /usr/sdk/arch-os-env/lib hello.cj -o hello
Alternatively, you can use shorthand parameters:
cjc --target=arch-os-env -B/usr/sdk/arch-os-env/bin -B/usr/sdk/arch-os-env/lib -L/usr/sdk/arch-os-env/lib hello.cj -o hello
If the toolchain’s directory structure follows conventional patterns, you can omit --toolchain and --library-path and use the following command:
cjc --target=arch-os-env --sysroot /usr/sdk/arch-os-env hello.cj -o hello
--strip-all, -s
When compiling an executable or dynamic library, this option removes the symbol table from the output file.
--discard-eh-frame
When compiling an executable or dynamic library, this option removes the eh_frame section and partial information from the eh_frame_hdr section (excluding crt-related information), reducing the size of the executable or dynamic library but affecting debugging information.
This feature is temporarily unsupported when compiling for macOS targets.
--set-runtime-rpath
Writes the absolute path of the Cangjie runtime library directory to the RPATH/RUNPATH section of the binary. Using this option eliminates the need to set the LD_LIBRARY_PATH (for Linux) or DYLD_LIBRARY_PATH (for macOS) environment variables to locate the Cangjie runtime library when running the program in the build environment.
This feature is unsupported when compiling for Windows targets.
--link-option <value>1
Specify linker options.
cjc will pass the value of this option as a parameter to the linker. Available parameters vary depending on the linker (system or specified). Multiple linker options can be specified by using --link-option multiple times.
--link-options <value>1
Specifies linker options.
cjc passes the arguments of this option directly to the linker. Available arguments vary depending on the linker (system or specified). Multiple linker options can be specified by using --link-options multiple times.
1 Superscript indicates that linker passthrough options may vary depending on the linker. Refer to the linker documentation for supported options.
--disable-reflection
Disable reflection, i.e., do not generate reflection information during compilation.
Note:
When cross-compiling for the
aarch64-linux-ohostarget, reflection information is disabled by default, and this option has no effect.
--profile-compile-time [frontend]
Outputs the time consumption data of each compilation phase into a file with the suffix .time.prof: if the output directory is specified, the file will be saved there; if output specifies a file, the .time.prof file will be created in the same directory as that file.
--profile-compile-memory [frontend]
Outputs the memory consumption data of each compilation phase into a file with the suffix .mem.prof: if the output directory is specified, the file will be saved there; if output specifies a file, the .mem.prof file will be created in the same directory as that file.
–sanitize=[address|thread|hwsaddress]
Enables the Sanitizer compile-time instrumentation feature, detects various errors in the program during runtime, and links the corresponding Sanitizer runtime libraries. Prior to using this option, you need to download the dedicated SDK package with Sanitizer support (e.g., cangjie-sdk-linux-aarch64-sanitizer.tar.gz) and ensure the proper deployment of the SDK.
--sanitize=addressDetects memory errors, corresponding to the cangjie/runtime/lib/linux_aarch64_cjnative/asan directory in the SDK.--sanitize=threadDetects data races, corresponding to the cangjie/runtime/lib/linux_aarch64_cjnative/tsan/tsan directory in the SDK.--sanitize=hwaddressDetects illegal hardware-level memory access behaviors, corresponding to the cangjie/runtime/lib/linux_aarch64_cjnative/hwasan directory in the SDK.
Usage Examples:
cjc --sanitize=address main.cj -o main
# Manually specify the runtime library path
export LD_LIBRARY_PATH=${CANGJIE_HOME}/runtime/lib/arch/<sanitizer-name>:$LD_LIBRARY_PATH
# Run the program
./main
Note:
The
--sanitizeoption cannot be used in conjunction with the--compile-macrooption; otherwise, a compilation error will be triggered.
–sanitize-set-rpath
This option automatically configures the search path for the corresponding Sanitizer runtime libraries, eliminating the need for users to execute an additional export LD_LIBRARY_PATH=${CANGJIE_HOME}/runtime/lib/arch/[asan|tsan|hwasan]:$LD_LIBRARY_PATH command.
For example, the above compilation command can be simplified to:
cjc --sanitize=address main.cj -o main --sanitize-set-rpath
Unit Test Options
--test [frontend]
An entry point provided by the unittest framework, automatically generated by macros. When compiling with the cjc --test option, the program entry point is no longer main but test_entry. For usage of the unittest framework, refer to the Cangjie Programming Language Library API documentation.
For the Cangjie file a.cj in the pkgc directory:
import std.unittest.*
import std.unittest.testmacro.*
@Test
public class TestA {
@TestCase
public func case1(): Unit {
print("case1\n")
}
}
You can compile a.cj in the pkgc directory using:
cjc a.cj --test
Executing main will produce the following output:
Note:
Execution time for test cases is not guaranteed to be consistent across runs.
case1
--------------------------------------------------------------------------------------------------
TP: default, time elapsed: 29710 ns, Result:
TCS: TestA, time elapsed: 26881 ns, RESULT:
[ PASSED ] CASE: case1 (16747 ns)
Summary: TOTAL: 1
PASSED: 1, SKIPPED: 0, ERROR: 0
FAILED: 0
--------------------------------------------------------------------------------------------------
For the following directory structure:
application
├── src
├── pkgc
| ├── a1.cj
| └── a2.cj
└── a3.cj
You can use the -p compilation option to compile the entire package in the application directory:
cjc --test -p pkgc
to compile the test cases a1.cj and a2.cj under the entire pkgc package.
/*a1.cj*/
package pkgc
import std.unittest.*
import std.unittest.testmacro.*
@Test
public class TestA {
@TestCase
public func caseA(): Unit {
print("case1\n")
}
}
/*a2.cj*/
package pkgc
import std.unittest.*
import std.unittest.testmacro.*
@Test
public class TestB {
@TestCase
public func caseB(): Unit {
throw IndexOutOfBoundsException()
}
}
Executing main will produce the following output (output is for reference only):
case1
--------------------------------------------------------------------------------------------------
TP: a, time elapsed: 367800 ns, Result:
TCS: TestA, time elapsed: 16802 ns, RESULT:
[ PASSED ] CASE: caseA (14490 ns)
TCS: TestB, time elapsed: 347754 ns, RESULT:
[ ERROR ] CASE: caseB (345453 ns)
REASON: An exception has occurred:IndexOutOfBoundsException
at std/core.Exception::init()(std/core/exception.cj:23)
at std/core.IndexOutOfBoundsException::init()(std/core/index_out_of_bounds_exception.cj:9)
at a.TestB::caseB()(/home/houle/cjtest/application/pkgc/a2.cj:7)
at a.lambda.1()(/home/houle/cjtest/application/pkgc/a2.cj:7)
at std/unittest.TestCases::execute()(std/unittest/test_case.cj:92)
at std/unittest.UT::run(std/unittest::UTestRunner)(std/unittest/test_runner.cj:194)
at std/unittest.UTestRunner::doRun()(std/unittest/test_runner.cj:78)
at std/unittest.UT::run(std/unittest::UTestRunner)(std/unittest/test_runner.cj:200)
at std/unittest.UTestRunner::doRun()(std/unittest/test_runner.cj:78)
at std/unittest.UT::run(std/unittest::UTestRunner)(std/unittest/test_runner.cj:200)
at std/unittest.UTestRunner::doRun()(std/unittest/test_runner.cj:75)
at std/unittest.entryMain(std/unittest::TestPackage)(std/unittest/entry_main.cj:11)
Summary: TOTAL: 2
PASSED: 1, SKIPPED: 0, ERROR: 1
FAILED: 0
--------------------------------------------------------------------------------------------------
--test-only [frontend]
The --test-only option is used to compile only the test portion of a package.
When enabled, the compiler will only compile test files (ending with _test.cj) in the package.
Note:
When using this option, the same package should first be compiled in regular mode, then dependencies should be added via the
-L/-llinking options or by including the.bcfiles when using theLTOoption. Otherwise, the compiler will report missing dependency symbols.
Example:
/*main.cj*/
package my_pkg
func concatM(s1: String, s2: String): String {
return s1 + s2
}
main(): Int64 {
println(concatM("a", "b"))
0
}
/*main_test.cj*/
package my_pkg
@Test
class Tests {
@TestCase
public func case1(): Unit {
@Expect("ac", concatM("a", "c"))
}
}
Compilation commands:
# Compile the production part of the package first, only `main.cj` file would be compiled here
cjc -p my_pkg --output-type=staticlib -o=output/libmain.a
# Compile the test part of the package, Only `main_test.cj` file would be compiled here
cjc -p my_pkg --test-only -L output -lmain --import-path output
--mock <on|off|runtime-error> [frontend]
If on is passed, the package will enable mock compilation, allowing classes in the package to be mocked in test cases. off explicitly disables mocking.
Note:
Mock support is automatically enabled in test mode (when
--testis enabled), and the--mockoption does not need to be explicitly passed.
runtime-error is only available in test mode (when --test is enabled). It allows compiling packages with mock code but does not perform any mock-related processing in the compiler (which may introduce overhead and affect test runtime performance). This can be useful for benchmarking test cases with mock code. Avoid compiling and running tests with mock code when using this option, as it will throw runtime exceptions.
Macro Options
cjc supports the following macro options. For more details on macros, refer to the Macros section.
--compile-macro [frontend]
Compile macro definition files to generate default macro definition dynamic library files.
--debug-macro [frontend]
Generate Cangjie code files after macro expansion. This option can be used to debug macro expansion functionality.
--parallel-macro-expansion [frontend]
Enable parallel macro expansion. This option can reduce macro expansion compilation time.
Conditional Compilation Options
cjc supports the following conditional compilation options. For more details on conditional compilation, refer to Conditional Compilation.
--cfg <value> [frontend]
Specify custom compilation conditions.
Parallel Compilation Options
cjc supports the following parallel compilation options for improved compilation efficiency.
--jobs <value>, -j <value> [frontend]
Set the maximum number of parallel jobs allowed during parallel compilation. value must be a reasonable non-negative integer. If value exceeds the hardware’s maximum parallel capability, the compiler will use the hardware’s maximum capability.
If this option is not set, the compiler will automatically calculate the maximum number of parallel jobs based on hardware capabilities.
Note:
--jobs 1indicates fully serial compilation.
--aggressive-parallel-compile, --apc, --aggressive-parallel-compile=<value>, --apc=<value> [frontend]
When enabled, the compiler adopts a more aggressive strategy (which may impact optimization and reduce program runtime performance) to achieve higher compilation efficiency. value is an optional parameter indicating the maximum number of parallel jobs for aggressive parallel compilation:
- If
valueis provided, it must be a reasonable non-negative integer. Ifvalueexceeds the hardware’s maximum parallel capability, the compiler will automatically calculate the maximum number of parallel jobs. It is recommended to setvalueto a non-negative integer less than the number of physical CPU cores. - If
valueis not provided, aggressive parallel compilation is enabled by default, and the number of parallel jobs matches--jobs.
Additionally, if the same code is compiled twice with different value settings or different states of this option, the compiler does not guarantee binary consistency between the outputs.
Rules for enabling/disabling aggressive parallel compilation:
-
Aggressive parallel compilation is forcibly disabled in the following scenarios:
--fobf-string--fobf-const--fobf-layout--fobf-cf-flatten--fobf-cf-bogus--lto--coverage- Compiling for Windows targets
- Compiling for macOS targets
-
If
--aggressive-parallel-compile=<value>or--apc=<value>is used:value <= 1: Disables aggressive parallel compilation.value > 1: Enables aggressive parallel compilation, with the number of parallel jobs determined byvalue.
-
If
--aggressive-parallel-compileor--apcis used withoutvalue, aggressive parallel compilation is enabled by default, and the number of parallel jobs matches--jobs. -
If this option is not set, the compiler defaults based on the scenario:
-O0: Aggressive parallel compilation is enabled by default, with the number of parallel jobs matching--jobs. It can be disabled using--aggressive-parallel-compile=<value>or--apc=<value>withvalue <= 1.- Non-
-O0: Aggressive parallel compilation is disabled by default. It can be enabled using--aggressive-parallel-compile=<value>or--apc=<value>withvalue > 1.
Optimization Options
--fchir-constant-propagation [frontend]
Enable CHIR constant propagation optimization.
--fno-chir-constant-propagation [frontend]
Disable CHIR constant propagation optimization.
--fchir-function-inlining [frontend]
Enable CHIR function inlining optimization.
--fno-chir-function-inlining [frontend]
Disable CHIR function inlining optimization.
--fchir-devirtualization [frontend]
Enable CHIR devirtualization optimization.
--fno-chir-devirtualization [frontend]
Disable CHIR devirtualization optimization.
--fast-math [frontend]
When enabled, the compiler makes aggressive (and potentially precision-losing) assumptions about floating-point operations to optimize them.
-O<N> [frontend]
Set the code optimization level.
Higher optimization levels generate more efficient code but may increase compilation time. cjc defaults to O0 optimization. Supported levels: O0, O1, O2, Os, Oz.
At optimization level 2, cjc enables the following options:
--fchir-constant-propagation--fchir-function-inlining--fchir-devirtualization
At optimization level s, cjc performs O2 optimizations and additionally optimizes for code size.
At optimization level z, cjc performs Os optimizations and further reduces code size.
Note:
When using Os or Oz, the link-time optimization option
--lto=[full|thin]cannot be used.
-O [frontend]
Equivalent to -O1.
Code Obfuscation Options
cjc supports code obfuscation for additional security. Obfuscation is disabled by default.
--fobf-string
Enable string obfuscation.
Obfuscates string constants in the code, preventing static reading of string data from binaries.
--fno-obf-string
Disable string obfuscation.
--fobf-const
Enable constant obfuscation.
Obfuscates numeric constants by replacing them with equivalent but more complex arithmetic instruction sequences.
--fno-obf-const
Disable constant obfuscation.
--fobf-layout
Enable layout obfuscation.
Obfuscates symbols (including function and global variable names), path names, line numbers, and function layout order. When enabled, cjc generates a symbol mapping output file *.obf.map in the current directory. If --obf-sym-output-mapping is specified, its value will be used as the output filename. The mapping file contains the relationship between original and obfuscated symbols, allowing deobfuscation.
Note:
Layout obfuscation conflicts with parallel compilation. Avoid enabling both simultaneously.
--fno-obf-layout
Disable layout obfuscation.
--obf-sym-prefix <string>
Specify a prefix string for obfuscated symbols.
When set, all obfuscated symbols will include this prefix. Useful for avoiding symbol conflicts when obfuscating multiple Cangjie packages.
--obf-sym-output-mapping <file>
Specify the output file for symbol obfuscation mappings.
The file records original symbol names, obfuscated names, and file paths for deobfuscation.
--obf-sym-input-mapping <file,...>
Specify input files for symbol obfuscation mappings.
These files define how symbols should be obfuscated. When compiling interdependent packages, use the mapping files from dependent packages to ensure consistent obfuscation.
--obf-apply-mapping-file <file>
Provide a custom symbol obfuscation mapping file.
File format:
<original_symbol_name> <new_symbol_name>
original_symbol_nameconsists of fields (e.g., module, package, class, function, or variable names) separated by'.'.- For functions, append parameter types in
'()'. For generic types, append type parameters in'<>'.
The compiler will replace original_symbol_name with new_symbol_name. Symbols not in the file will be randomly obfuscated. Conflicts with --obf-sym-input-mapping will cause compilation errors.
--fobf-export-symbols
Allow obfuscation of exported symbols. Enabled by default when layout obfuscation is active.
--fno-obf-export-symbols
Disable obfuscation of exported symbols.
--fobf-source-path
Allow obfuscation of path information in symbols. Enabled by default when layout obfuscation is active.
When enabled, path names in stack traces are replaced with "SOURCE".
--fno-obf-source-path
Disable obfuscation of path information in stack traces.
--fobf-line-number
Enable obfuscation of line numbers in stack traces.
When enabled, line numbers are replaced with 0.
--fno-obf-line-number
Disable obfuscation of line numbers in stack traces.
--fobf-cf-flatten
Enable control flow flattening obfuscation.
Obfuscates existing control flow to make it more complex.
--fno-obf-cf-flatten
Disable control flow flattening obfuscation.
--fobf-cf-bogus
Enable bogus control flow obfuscation.
Inserts fake control flow to complicate code logic.
--fno-obf-cf-bogus
Disable bogus control flow obfuscation.
--fobf-all
Enable all obfuscation features.
Equivalent to:
--fobf-string--fobf-const--fobf-layout--fobf-cf-flatten--fobf-cf-bogus
--obf-config <file>
Specify a configuration file for code obfuscation.
The file can exclude specific functions or symbols from obfuscation.
File format:
obf_func1 name1
obf_func2 name2
...
obf_func: Obfuscation feature (e.g.,obf-cf-bogus,obf-cf-flatten,obf-const,obf-layout).name: Target to exclude, composed of fields (e.g., package, class, function names) separated by'.'. For functions, append parameter types in'()'.
Wildcards are supported:
?: Matches a single character.*: Matches any number of characters (excluding separators and parameters).**: Matches any number of characters (including separators and parameters; must be a standalone field)....: Matches any number of parameters.***: Matches a single parameter of any type.
Example rules:
obf-cf-flatten pro?.myfunc(): Excludespro0.myfunc()but notpro00.myfunc().* pro0.**: Excludes all functions/v## Secure Compilation Options
cjc generates position-independent code by default and produces position-independent executables when compiling executable files.
When building Release versions, it is recommended to enable/disable compilation options according to the following rules to enhance security.
Enable --trimpath <value> [frontend]
Removes the specified absolute path prefix from debugging and exception information. Using this option prevents build path information from being written into the binary.
After enabling this option, source code path information in the binary is usually no longer complete, which may affect debugging. It is recommended to disable this option when building debug versions.
Enable --strip-all, -s
Removes the symbol table from the binary. Using this option deletes symbol-related information that is not required during runtime.
After enabling this option, the binary cannot be debugged. Please disable this option when building debug versions.
Disable --set-runtime-rpath
If the executable will be distributed to different environments for execution, or if other regular users have write permissions to the directory of the currently used Cangjie runtime library, enabling this option may pose security risks. Therefore, disable this option.
This option is not applicable when compiling Windows targets.
Enable --link-options "-z noexecstack"1
Sets the thread stack to non-executable.
Only available when compiling Linux targets.
Enable --link-options "-z relro"1
Sets the GOT table relocation to read-only.
Only available when compiling Linux targets.
Enable --link-options "-z now"1
Enables immediate binding.
Only available when compiling Linux targets.
Code Coverage Instrumentation Options
Note:
Windows and macOS versions currently do not support code coverage instrumentation options.
Cangjie supports code coverage instrumentation (SanitizerCoverage, hereafter referred to as SanCov), providing interfaces consistent with LLVM’s SanitizerCoverage. The compiler inserts coverage feedback functions at the function level or BasicBlock level. Users only need to implement the agreed callback functions to monitor program execution status during runtime.
Cangjie’s SanCov functionality operates at the package level, meaning the entire package is either fully instrumented or not instrumented at all.
--sanitizer-coverage-level=0/1/2
Instrumentation level: 0 means no instrumentation; 1 means function-level instrumentation, inserting callback functions only at function entry points; 2 means BasicBlock-level instrumentation, inserting callback functions at various BasicBlocks.
If not specified, the default value is 2.
This compilation option only affects the instrumentation level of --sanitizer-coverage-trace-pc-guard, --sanitizer-coverage-inline-8bit-counters, and --sanitizer-coverage-inline-bool-flag.
--sanitizer-coverage-trace-pc-guard
Enabling this option inserts a function call __sanitizer_cov_trace_pc_guard(uint32_t *guard_variable) at each Edge, influenced by sanitizer-coverage-level.
Note: This feature differs from gcc/llvm implementations: it does not insert void __sanitizer_cov_trace_pc_guard_init(uint32_t *start, uint32_t *stop) in the constructor. Instead, it inserts the function call uint32_t *__cj_sancov_pc_guard_ctor(uint64_t edgeCount) during package initialization.
The __cj_sancov_pc_guard_ctor callback function must be implemented by the developer. Packages with SanCov enabled will call this callback as early as possible. The input parameter is the number of Edges in the package, and the return value is typically a memory region created by calloc.
If __sanitizer_cov_trace_pc_guard_init needs to be called, it is recommended to call it within __cj_sancov_pc_guard_ctor, using dynamically created buffers to compute the function’s input parameters and return value.
A standard implementation of __cj_sancov_pc_guard_ctor is as follows:
uint32_t *__cj_sancov_pc_guard_ctor(uint64_t edgeCount) {
uint32_t *p = (uint32_t *) calloc(edgeCount, sizeof(uint32_t));
__sanitizer_cov_trace_pc_guard_init(p, p + edgeCount);
return p;
}
--sanitizer-coverage-inline-8bit-counters
Enabling this option inserts an 8-bit counter at each Edge. Each time an Edge is traversed, the counter increments by one, influenced by sanitizer-coverage-level.
Note: This feature differs from gcc/llvm implementations: it does not insert void __sanitizer_cov_8bit_counters_init(char *start, char *stop) in the constructor. Instead, it inserts the function call uint8_t *__cj_sancov_8bit_counters_ctor(uint64_t edgeCount) during package initialization.
The __cj_sancov_pc_guard_ctor callback function must be implemented by the developer. Packages with SanCov enabled will call this callback as early as possible. The input parameter is the number of Edges in the package, and the return value is typically a memory region created by calloc.
If __sanitizer_cov_8bit_counters_init needs to be called, it is recommended to call it within __cj_sancov_8bit_counters_ctor, using dynamically created buffers to compute the function’s input parameters and return value.
A standard implementation of __cj_sancov_8bit_counters_ctor is as follows:
uint8_t *__cj_sancov_8bit_counters_ctor(uint64_t edgeCount) {
uint8_t *p = (uint8_t *) calloc(edgeCount, sizeof(uint8_t));
__sanitizer_cov_8bit_counters_init(p, p + edgeCount);
return p;
}
--sanitizer-coverage-inline-bool-flag
Enabling this option inserts a boolean flag at each Edge. The boolean flag corresponding to a traversed Edge is set to True, influenced by sanitizer-coverage-level.
Note: This feature differs from gcc/llvm implementations: it does not insert void __sanitizer_cov_bool_flag_init(bool *start, bool *stop) in the constructor. Instead, it inserts the function call bool *__cj_sancov_bool_flag_ctor(uint64_t edgeCount) during package initialization.
The __cj_sancov_bool_flag_ctor callback function must be implemented by the developer. Packages with SanCov enabled will call this callback as early as possible. The input parameter is the number of Edges in the package, and the return value is typically a memory region created by calloc.
If __sanitizer_cov_bool_flag_init needs to be called, it is recommended to call it within __cj_sancov_bool_flag_ctor, using dynamically created buffers to compute the function’s input parameters and return value.
A standard implementation of __cj_sancov_bool_flag_ctor is as follows:
bool *__cj_sancov_bool_flag_ctor(uint64_t edgeCount) {
bool *p = (bool *) calloc(edgeCount, sizeof(bool));
__sanitizer_cov_bool_flag_init(p, p + edgeCount);
return p;
}
--sanitizer-coverage-pc-table
This compilation option provides the correspondence between instrumentation points and source code, currently only offering function-level correspondence. It must be used in conjunction with --sanitizer-coverage-trace-pc-guard, --sanitizer-coverage-inline-8bit-counters, or --sanitizer-coverage-inline-bool-flag. At least one of these options must be enabled, and multiple can be enabled simultaneously.
Note: This feature differs from gcc/llvm implementations: it does not insert void __sanitizer_cov_pcs_init(const uintptr_t *pcs_beg, const uintptr_t *pcs_end); in the constructor. Instead, it inserts the function call void __cj_sancov_pcs_init(int8_t *packageName, uint64_t n, int8_t **funcNameTable, int8_t **fileNameTable, uint64_t *lineNumberTable) during package initialization. The parameters are as follows:
int8_t *packageName: A string representing the package name (instrumentation uses C-style int8 arrays for strings).uint64_t n: Indicates that n functions are instrumented.int8_t **funcNameTable: A string array of length n, where the i-th instrumentation point corresponds to the function name funcNameTable[i].int8_t **fileNameTable: A string array of length n, where the i-th instrumentation point corresponds to the file name fileNameTable[i].uint64_t *lineNumberTable: A uint64 array of length n, where the i-th instrumentation point corresponds to the line number lineNumberTable[i].
If __sanitizer_cov_pcs_init needs to be called, you must manually convert Cangjie’s pc-table to C-language pc-table.
--sanitizer-coverage-stack-depth
Enabling this compilation option inserts a call to __updateSancovStackDepth at each function entry point, as Cangjie cannot retrieve the SP pointer value. Implementing this function on the C side allows access to the SP pointer.
A standard implementation of updateSancovStackDepth is as follows:
thread_local void* __sancov_lowest_stack;
void __updateSancovStackDepth()
{
register void* sp = __builtin_frame_address(0);
if (sp < __sancov_lowest_stack) {
__sancov_lowest_stack = sp;
}
}
--sanitizer-coverage-trace-compares
Enabling this option inserts callback functions before all compare and match instructions. The list below matches LLVM’s API functionality. Refer to Tracing data flow.
void __sanitizer_cov_trace_cmp1(uint8_t Arg1, uint8_t Arg2);
void __sanitizer_cov_trace_const_cmp1(uint8_t Arg1, uint8_t Arg2);
void __sanitizer_cov_trace_cmp2(uint16_t Arg1, uint16_t Arg2);
void __sanitizer_cov_trace_const_cmp2(uint16_t Arg1, uint16_t Arg2);
void __sanitizer_cov_trace_cmp4(uint32_t Arg1, uint32_t Arg2);
void __sanitizer_cov_trace_const_cmp4(uint32_t Arg1, uint32_t Arg2);
void __sanitizer_cov_trace_cmp8(uint64_t Arg1, uint64_t Arg2);
void __sanitizer_cov_trace_const_cmp8(uint64_t Arg1, uint64_t Arg2);
void __sanitizer_cov_trace_switch(uint64_t Val, uint64_t *Cases);
--sanitizer-coverage-trace-memcmp
This compilation option provides prefix comparison feedback for String and Array comparisons. Enabling this option inserts callback functions before String and Array comparison functions. The following APIs for Strings and Arrays will insert corresponding stub functions:
- String==: __sanitizer_weak_hook_memcmp
- String.startsWith: __sanitizer_weak_hook_memcmp
- String.endsWith: __sanitizer_weak_hook_memcmp
- String.indexOf: __sanitizer_weak_hook_strstr
- String.replace: __sanitizer_weak_hook_strstr
- String.contains: __sanitizer_weak_hook_strstr
- CString==: __sanitizer_weak_hook_strcmp
- CString.startswith: __sanitizer_weak_hook_memcmp
- CString.endswith: __sanitizer_weak_hook_strncmp
- CString.compare: __sanitizer_weak_hook_strcmp
- CString.equalsLower: __sanitizer_weak_hook_strcasecmp
- Array==: __sanitizer_weak_hook_memcmp
- ArrayList==: __sanitizer_weak_hook_memcmp
Experimental Feature Options
--enable-eh [frontend]
Enabling this option allows Cangjie to support Effect Handlers, an advanced control flow mechanism for implementing modular, resumable side-effect handling.
Effect Handlers enable programmers to decouple side-effect operations from their handling logic, resulting in cleaner, more composable code. This mechanism enhances abstraction levels, particularly for handling operations like logging, I/O, and state changes, preventing main flow contamination by side-effect logic.
Effects work similarly to exception handling but use perform and handle instead of throw and catch. Each effect must be defined by inheriting from the stdx.effect.Command class.
Unlike traditional exception mechanisms, Effect Handlers can choose to resume execution after handling an effect, injecting a value back into the original call site and continuing execution. This “resume” capability allows finer control over program flow, making it ideal for building simulators, interpreters, or cooperative multitasking systems requiring high control.
Example:
import stdx.effect.Command
// Define a Command named GetNumber
class GetNumber <: Command<Int64> {}
main() {
try {
println("About to perform")
// Perform the GetNumber effect
let a = perform GetNumber()
// Execution resumes here after handler
println("It is resumed, a = ${a}")
} handle(e: GetNumber) {
// Handle the GetNumber effect
println("It is performed")
// Resume execution, injecting value 9
resume with 9
}
0
}
In this example, a new Command subclass GetNumber is defined.
- In the
mainfunction, thetry-handlestructure handles the effect. - The
tryblock first prints a message ("About to perform"), then performs the effect withperform GetNumber(). The return value ofperformis assigned to variablea. Performing an effect jumps execution to thehandleblock capturing this effect. - The
handleblock captures and handles theGetNumbereffect, printing a message ("It is performed") and usingresume with 9to inject the constant9back into the original call site, resuming execution afterperformto print ("It is resumed, a = 9").
Output:
About to perform
It is performed
It is resumed, a = 9
Note:
- Effect Handlers are currently experimental. This option may change in future versions; use with caution.
- Using Effect Handlers requires importing the
stdx.effectlibrary.
--experimental [frontend]
Enables experimental features, allowing the use of other experimental options on the command line.
Note:
Binaries generated using experimental features may have potential runtime issues. Be aware of the risks when using this option.
Compiler Plugin Options
--plugin <value> [frontend]
Provides compiler plugin capability. As an experimental feature, it is currently only for internal validation and does not support custom plugin development. Using this option may cause errors.
Other Features
Compiler Error Message Colors
For the Windows version of the Cangjie compiler, error messages are displayed in color only when running on Windows 10 version 1511 (Build 10586) or later. Otherwise, no colors are displayed.
Setting build-id
Use --link-options "--build-id=<arg>"1 to pass linker options for setting build-id.
This feature is not supported when compiling Windows targets.
Setting rpath
Use --link-options "-rpath=<arg>"1 to pass linker options for setting rpath.
This feature is not supported when compiling Windows targets.
Incremental Compilation
Enable incremental compilation with --incremental-compile [frontend]. When enabled, cjc uses cache files from previous compilations to speed up the current compilation.
Note:
This is an experimental feature. Binaries generated with this option may have potential runtime issues; use with caution. This option must be used with
--experimental. Specifying this option saves incremental compilation cache and logs to the.cacheddirectory under the output file path.
Output CHIR
Use --emit-chir=[raw|opt] [frontend] to specify output of serialized CHIR compilation phase products. raw outputs CHIR before compiler optimization, opt outputs CHIR after optimization. Using --emit-chir defaults to outputting optimized CHIR.
--no-prelude [frontend]
Disables automatic import of the standard library core package.
Note:
This option can only be used when compiling the Cangjie standard library core package, not for other Cangjie code compilation scenarios.
Dump AST
You can dump the AST using --dump-ast [frontend]. By default, the output is written to a file. The output directory will create a folder named with the package name (or the product name specified with -o) followed by *_AST. Files are named as number_phase_ast.txt. Adding --dump-to-screen [frontend] will dump the output to the screen.
Dump CHIR
You can dump CHIR using --dump-chir [frontend]. By default, the output is written to a file. The output directory will create a folder named with the package name (or the product name specified with -o) followed by *_CHIR. Files are named as number_phase.chirtxt. Adding --dump-to-screen [frontend] will dump the output to the screen.
Dump LLVM IR
You can dump LLVM IR using --dump-ir [frontend]. By default, the output is written to a file. The output directory will create a folder named with the package name (or the product name specified with -o) followed by *_IR. Inside the *_IR directory, subfolders named number_phase will be created. Files are named as submoduleNumber-packageName.ll. The numbering and quantity of submodules depend on the compilation concurrency. Adding --dump-to-screen [frontend] will dump the output to the screen.
Dump AST, CHIR, LLVM IR
You can dump AST, CHIR, and LLVM IR using --dump-all [frontend]. By default, the output is written to a file. The output directory will create *_AST, *_CHIR, and *_IR folders named with the package name (or the product name specified with -o). Adding --dump-to-screen [frontend] will dump the output to the screen.
Dump content to the screen
You can use --dump-to-screen [frontend] together with frontend-related dump options (such as --dump-ast [frontend], --dump-chir [frontend], --dump-ir [frontend], and --dump-all [frontend]) to dump the corresponding intermediate representation text content to the screen.
Note:
When outputting to the screen, only the final result is displayed. When outputting to files, the output directory will contain folders with suffixes
_AST,_CHIR, and_IRto store detailed information about intermediate processes.
Environment Variables Used by cjc
This section introduces some environment variables that the Cangjie compiler may use during the code compilation process.
TMPDIR or TMP
The Cangjie compiler places temporary files generated during compilation in a temporary directory. By default, Linux and macOS place them in /tmp, while Windows places them in C:\Windows\Temp. The Cangjie compiler also supports custom temporary file directories. On Linux and macOS, set the TMPDIR environment variable; on Windows, set the TMP environment variable.
Example: In Linux shell:
export TMPDIR=/home/xxxx
In Windows cmd:
set TMP=D:\\xxxx
Linux Distribution Support and Installation for the Toolchain
The Cangjie toolchain has undergone comprehensive functional testing on the following Linux distributions:
- Ubuntu 18.04
- Ubuntu 20.04
- UnionTech OS Server 20
- Kylin Linux Advanced Server Release V10
Dependency Installation Commands for Cangjie Toolchain Across Linux Distributions
Note:
Certain tools required by the current Cangjie toolchain may not be directly installable through default system repositories on some Linux distributions. Please refer to the next section Compiling and Installing Dependency Tools for manual installation instructions.
Additionally, OpenSSL 3 needs to be installed. Refer to Compiling and Installing Dependency Tools for installation methods.
Ubuntu 18.04
$ apt-get install \
binutils \
libc-dev \
libc++-dev \
libgcc-7-dev
Additionally, OpenSSL 3 needs to be installed. Refer to Compiling and Installing Dependency Tools for installation methods.
Ubuntu 20.04
$ apt-get install \
binutils \
libc-dev \
libc++-dev \
libgcc-9-dev
Additionally, OpenSSL 3 needs to be installed. Refer to Compiling and Installing Dependency Tools for installation methods.
UnionTech OS Server 20
$ yum install \
binutils \
glibc-devel \
libstdc++-devel \
gcc \
Additionally, OpenSSL 3 needs to be installed. Refer to Compiling and Installing Dependency Tools for installation methods.
Kylin Linux Advanced Server Release V10
$ yum install \
binutils \
glibc-devel \
libstdc++-devel \
gcc \
Additionally, OpenSSL 3 needs to be installed. Refer to Compiling and Installing Dependency Tools for installation methods.
Other Linux Distributions
Depending on the Linux distribution in use, you may need to refer to the dependency installation commands for the above systems and use the system’s package manager to install corresponding dependencies. If the system does not provide the required packages, you may need to manually install linkers, C development tools, C++ development tools, GCC compilers, and OpenSSL 3 to use the Cangjie toolchain properly.
Compiling and Installing Dependency Tools
Some standard libraries (and certain tools) in the current Cangjie toolchain utilize the OpenSSL 3 open-source software. For scenarios where the system package manager does not provide OpenSSL 3, users may need to compile and install OpenSSL 3 from source. This section provides the methods and steps for compiling OpenSSL 3 from source.
OpenSSL 3
Download the OpenSSL 3 source code from the following links:
OpenSSL 3.0.7 or later is recommended.
Note:
Please carefully read the following notes before executing the compilation and installation commands, and adjust the commands according to your actual situation. Incorrect configuration and installation may render other system software unusable. If you encounter issues during compilation and installation or wish to perform additional configuration, refer to the
INSTALLfile in the OpenSSL source code or OpenSSL’s FAQ.
Using OpenSSL 3.0.7 as an example, after downloading, extract the archive with the following command:
$ tar xf openssl-3.0.7.tar.gz
After extraction, enter the directory:
$ cd openssl-3.0.7
Compile OpenSSL:
Note:
If OpenSSL is already installed on the system, it is recommended to use the
--prefix=<path>option to specify a custom installation path, such as--prefix=/usr/local/openssl-3.0.7or a developer’s personal directory. Directly compiling and installing with the following commands in a system where OpenSSL already exists may overwrite the system OpenSSL, causing applications dependent on it to become unusable.
$ ./Configure --libdir=lib
$ make
Test OpenSSL:
$ make test
Install OpenSSL to the system directory (or the previously specified --prefix directory). Root privileges may be required to successfully execute the following command:
$ make install
or
$ sudo make install
If a custom installation path was not specified via --prefix during OpenSSL compilation, the installation is now complete. If a custom path was specified via --prefix, the following variables must be set to ensure the Cangjie toolchain can locate OpenSSL 3.
Note:
If other versions of OpenSSL exist on the system, configuring these variables may affect the OpenSSL version used by other compilation and development tools besides the Cangjie toolchain. If OpenSSL incompatibility issues arise with other tools, configure these variables only for the Cangjie development environment.
Replace <prefix> with your specified custom installation path.
$ export LIBRARY_PATH=<prefix>/lib:$LIBRARY_PATH
$ export LD_LIBRARY_PATH=<prefix>/lib:$LD_LIBRARY_PATH
The environment variables configured this way are only effective for the current shell session. To automatically configure them every time the shell starts, add the above commands to $HOME/.bashrc, $HOME/.zshrc, or other shell configuration files (depending on the developer’s shell type).
To make the configuration effective for all users by default, execute the following commands:
Replace <prefix> with your specified custom installation path.
$ echo "export LIBRARY_PATH=<prefix>/lib:$LIBRARY_PATH" >> /etc/profile
$ echo "<prefix>/lib" >> /etc/ld.so.conf
$ ldconfig
After execution, reopen the shell session for the changes to take effect.
At this point, OpenSSL 3 has been successfully installed. You may return to the previous section to continue reading or attempt to run the Cangjie compiler.
Runtime Environment Variables Manual
This section introduces the environment variables provided by the runtime.
In Linux shell and macOS shell, you can set the environment variables provided by the Cangjie runtime using the following method:
$ export VARIABLE=value
In Windows cmd, you can set the environment variables provided by the Cangjie runtime using the following method:
> set VARIABLE=value
The subsequent examples in this section are based on Linux shell settings. If they do not match your platform, please choose the appropriate environment variable setting method for your platform.
Runtime Initialization Optional Configurations
Notes:
- All integer parameters are of Int64 type, and floating-point parameters are of Float64 type;
- If no maximum value is explicitly specified for any parameter, the default implicit maximum value is the maximum value of that type;
- If any parameter exceeds the valid range, the setting will be invalid, and the default value will be used automatically.
- All parameters are invalid on the OpenHarmony platform. The Cangjie runtime uses default values on the OpenHarmony platform.
cjHeapSize
Specifies the maximum size of the Cangjie heap. Supported units are kb (KB), mb (MB), and gb (GB). The valid range is [4MB, system physical memory]. Settings outside this range will be invalid, and the default value will be used. If the physical memory is less than 1GB, the default value is 64 MB; otherwise, it is 256 MB. The minimum supported configuration value for OpenHarmony and Android platforms is 64MB, the default value will be applied if the configured value is lower than this.
Example:
export cjHeapSize=4GB
cjRegionSize
Specifies the size of the thread-local buffer for the region allocator. Supported units are kb (KB), mb (MB), and gb (GB). The valid range is [4kb, 2048kb]. Settings outside this range will be invalid, and the default value will be used. The default value is 64 KB. The minimum supported configuration value for the macOS platform is 16KB, the default value will be applied if the configured value is lower than this.
Example:
export cjRegionSize=1024kb
cjLargeThresholdSize
Objects requiring large contiguous memory spaces (e.g., long arrays) are called large objects. Frequent allocation of large objects in the heap may lead to insufficient contiguous space, triggering heap overflow issues. Increasing the maximum size of large objects can improve the continuity of heap space.
In Cangjie, the threshold for large objects is the smaller of cjLargeThresholdSize and cjRegionSize. cjLargeThresholdSize supports units of kb (KB), mb (MB), and gb (GB), with a valid range of [4KB, 2048KB]. Settings outside this range will be invalid, and the default value will be used. The default value is 32 KB.
Note:
A larger threshold for large objects may impact program performance. Developers should set this value based on actual requirements.
Example:
export cjLargeThresholdSize=1024kb
cjExemptionThreshold
Specifies the waterline value for live regions. The value must be in the range (0, 1]. When multiplied by the region size, if the size of live objects in a region exceeds this product, the region will not be reclaimed (with dead objects continuing to occupy memory). A higher value increases the likelihood of region reclamation, reducing heap fragmentation but potentially impacting performance due to frequent reclamation. Settings outside the range will be invalid, and the default value will be used. The default value is 0.8 (80%).
Example:
export cjExemptionThreshold=0.8
cjHeapUtilization
Specifies the utilization rate of the Cangjie heap. This parameter is one of the references for updating the heap waterline after GC. The value must be in the range (0, 1]. The heap waterline is the threshold at which GC is triggered when the total size of objects in the heap reaches this value. A smaller value results in a higher updated heap waterline, reducing the likelihood of GC being triggered. Settings outside the range will be invalid, and the default value will be used. The default value is 0.8 (80%).
Example:
export cjHeapUtilization=0.8
cjHeapGrowth
Specifies the growth rate of the Cangjie heap. This parameter is one of the references for updating the heap waterline after GC. The value must be greater than 0. The growth rate is calculated as 1 + cjHeapGrowth. A higher value results in a higher updated heap waterline, reducing the likelihood of GC being triggered. The default value is 0.15, indicating a growth rate of 1.15.
Example:
export cjHeapGrowth=0.15
cjAlloctionRate
Specifies the object allocation rate of the Cangjie runtime. The value must be greater than 0, in MB/s, indicating the number of objects that can be allocated per second. The default value is 10240, meaning 10240 MB of objects can be allocated per second.
Example:
export cjAlloctionRate=10240
cjAlloctionWaitTime
Specifies the wait time for object allocation in the Cangjie runtime. The value must be greater than 0. Supported units are s, ms, us, and ns, with nanoseconds (ns) recommended. If the time interval since the last object allocation is less than this value, the allocation will wait. The default value is 1000 ns.
Example:
export cjAlloctionWaitTime=1000ns
cjGCThreshold
Specifies the reference waterline value for the Cangjie heap. Supported units are kb (KB), mb (MB), and gb (GB). The value must be a positive integer. GC is triggered when the Cangjie heap size exceeds this value. The default value is the heap size.
Example:
export cjGCThreshold=20480KB
cjGarbageThreshold
When GC occurs, if the ratio of dead objects in a region exceeds this environment variable, the region will be added to the reclamation candidate set and may be reclaimed later (though other policies may prevent reclamation). The default value is 0.5, dimensionless, with a valid range of [0.0, 1.0].
Example:
export cjGarbageThreshold=0.5
cjGCInterval
Specifies the interval between two GC operations. The value must be greater than 0. Supported units are s, ms, us, and ns, with milliseconds (ms) recommended. If the time since the last GC is less than this value, the current GC will be skipped. This parameter controls the frequency of GC. The default value is 150 ms.
Example:
export cjGCInterval=150ms
cjBackupGCInterval
Specifies the interval for backup GC. The value must be greater than 0. Supported units are s, ms, us, and ns, with seconds (s) recommended. If the Cangjie runtime does not trigger GC within the specified time, a backup GC will be triggered. The default value is 240 seconds (4 minutes).
Example:
export cjBackupGCInterval=240s
cjProcessorNum
Specifies the maximum concurrency of Cangjie threads. The valid range is (0, CPU cores * 2]. Settings outside this range will be invalid, and the default value will be used. The system API is called to obtain the number of CPU cores. If successful, the default value is the number of CPU cores; otherwise, it is 8.
Example:
export cjProcessorNum=2
cjStackSize
Specifies the stack size of Cangjie threads. Supported units are kb (KB), mb (MB), and gb (GB). The valid range is [64KB, 1GB] on Linux and [128KB, 1GB] on Windows. Settings outside this range will be invalid, and the default value will be used. The default value is 128KB.
Example:
export cjStackSize=100kb
Operational Logging Optional Configurations
MRT_LOG_FILE_SIZE
Specifies the file size for runtime operational logs. The default value is 10 MB. Supported units are kb (KB), mb (MB), and gb (GB). The value must be greater than 0.
When the log size exceeds this value, logging will restart from the beginning of the file.
The final log size will be slightly larger than MRT_LOG_FILE_SIZE.
Example:
export MRT_LOG_FILE_SIZE=100kb
MRT_LOG_PATH
Specifies the output path for runtime operational logs. If this environment variable is not set or the path setting fails, logs will default to stdout (standard output) or stderr (standard error).
Example:
export MRT_LOG_PATH=/home/cangjie/runtime/runtime_log.txt
MRT_LOG_LEVEL
Specifies the minimum output level for runtime operational logs. Logs at or above this level will be printed. The default value is e. Supported values are [v|d|i|w|e|f|s]: v (VERBOSE), d (DEBUG), i (INFO), w (WARNING), e (ERROR), f (FATAL), s (FATAL_WITHOUT_ABORT).
Example:
export MRT_LOG_LEVEL=v
MRT_REPORT
Specifies the output path for runtime GC logs. If this environment variable is not set or the path setting fails, logs will not be printed by default.
Example:
export MRT_REPORT=/home/cangjie/runtime/gc_log.txt
MRT_LOG_CJTHREAD
Specifies the output path for cjthread logs. If this environment variable is not set or the path setting fails, logs will not be printed by default.
Example:
export MRT_LOG_CJTHREAD=/home/cangjie/runtime/cjthread_log.txt
cjHeapDumpOnOOM
Specifies whether to generate a heap dump file after an OutOfMemory error. By default, this feature is disabled. Supported values are [on|off]. Setting it to “on” enables the feature; other values disable it.
Example:
export cjHeapDumpOnOOM=on
cjHeapDumpLog
Specifies the output path for heap dump files. Note that the specified path must exist, and the application executor must have read/write permissions. If not specified, heap dump files will be output to the current execution directory.
Example:
export cjHeapDumpLog=/home/cangjie
Runtime Environment Optional Configurations
MRT_STACK_CHECK
Enables native stack overflow checking. By default, this feature is disabled. Supported values are 1, true, or TRUE to enable the feature.
Example:
export MRT_STACK_CHECK=true
CJ_SOF_SIZE
When a StackOverflowError occurs, the stack trace will be automatically folded for readability. The default number of folded stack frames is 32. This environment variable controls the length of the folded stack. Valid values are integers within the int range:
- CJ_SOF_SIZE = 0: Prints the entire stack trace.
- CJ_SOF_SIZE < 0: Prints the specified number of frames from the bottom of the stack.
- CJ_SOF_SIZE > 0: Prints the specified number of frames from the top of the stack.
- CJ_SOF_SIZE not set: Defaults to printing the top 32 frames of the stack.
Example:
export CJ_SOF_SIZE=30
Cangjie GWP-Asan Memory Safety Detection
During interactions between Cangjie and C code, certain Cangjie heap memory safety issues may arise. Cangjie GWP-Asan provides a memory safety detection feature. It can detect Cangjie heap memory safety issues during program execution. GWP-Asan samples the acquireArrayRawData and releaseArrayRawData interfaces provided by the Cangjie standard library (see the “std.core” package section in the Cangjie Programming Language Library API Documentation), comparing Canary data before and after sampling to detect memory safety issues during Cangjie-C interactions.
Cangjie GWP-Asan is a sampling-based detection tool. The sampling frequency can be adjusted by setting different values to balance performance impact and detection coverage. At default or lower sampling frequencies, CPU performance overhead and additional memory usage are minimal.
Note:
Cangjie GWP-Asan memory safety detection is only supported on Linux and OpenHarmony.
cjEnableGwpAsan
Cangjie GWP-Asan memory safety detection is disabled by default. It can be enabled by setting the environment variable cjEnableGwpAsan to 1, true, or TRUE. For Linux, the setting is as follows:
export cjEnableGwpAsan=true
cjGwpAsanSampleRate
When Cangjie GWP-Asan is enabled, the sampling frequency can be set via the cjGwpAsanSampleRate environment variable. cjGwpAsanSampleRate supports positive integers within the 32-bit integer range, i.e., $(0, 2^{31} - 1]$. The default value is 5000, meaning one sample is taken every 5000 acquireArrayRawData calls. For Linux, the setting is as follows:
export cjGwpAsanSampleRate=1000
```> **Note:**
>
> In Cangjie GWP-Asan memory safety detection, sampling impacts performance. Higher sampling rates result in greater performance overhead but detect more issues; lower sampling rates reduce performance impact but detect fewer issues. Adjust the sampling rate according to actual requirements.
#### cjGwpAsanHelp
The environment variable `cjGwpAsanHelp` controls whether to display GWP-Asan help information in the console. By default, this feature is disabled. When `cjGwpAsanHelp` is set to `1`, `true`, or `TRUE`, help information will be printed to the console. For Linux, configure as follows:
```shell
export cjGwpAsanHelp=true
Constraints
- Cangjie GWP-Asan is a sampling-based memory checking tool, which may not detect all memory boundary violations.
- Cangjie GWP-Asan has limited detection scope for out-of-bounds access on Cangjie heap memory. It cannot detect read out-of-bounds violations and only detects partial write out-of-bounds cases:
- Forward write out-of-bounds within 8 bytes.
- Backward write out-of-bounds into the padding area at the end (padding size varies from 0-7 bytes depending on array object length).
Error Detection Types
Heap Memory Write Out-of-Bounds
Heap memory write out-of-bounds occurs when a pointer accesses memory beyond the allocated array length, causing a Cangjie heap memory write violation.
-
Forward Out-of-Bounds
For forward out-of-bounds array access, the runtime reports a “Head canary” check failure, indicated as
array[-1]. Example:unsafe { let array = Array<UInt8>(4, item: 0) let cp = acquireArrayRawData(array) // The valid access range for the array is [0, 4). The following write operation accesses the -2nd byte, causing a 2-byte forward out-of-bounds violation. The error report uses array[-1] to indicate this behavior. cp.pointer.read(-2) releaseArrayRawData(array) }Corresponding error report:
2025-05-22 10:57:13.432786 41217 F Gwp-Asan sanity check failed on raw array addr 0x7f7c887368 2025-05-22 10:57:13.432863 41217 F Head canary (array[-1]) mismatch: expect: 0x2, actual: 0x200000000000002 2025-05-22 10:57:13.432878 41217 F Gwp-Asan Aborted. -
Backward Out-of-Bounds
For backward out-of-bounds array access, the runtime reports a “Tail canary” check failure, indicating the relative position beyond the array (
array). Example:unsafe { let array = Array<UInt8>(4, item: 0) let cp = acquireArrayRawData(array) // The valid access range for the array is [0, 4). The following write operation accesses the 6th byte, causing a 2-byte backward out-of-bounds violation. The error report uses array[size+1] to indicate this behavior. cp.pointer.read(5) releaseArrayRawData(array) }Corresponding error report:
2025-05-22 10:53:09.564580 37872 F Gwp-Asan sanity check failed on raw array addr 0x7f6278a368 2025-05-22 10:53:09.564761 37872 F Tail canary (array[size+1]) mismatch: expect: 0x6, actual: 0x2 2025-05-22 10:53:09.564788 37872 F Gwp-Asan Aborted.
Cangjie GC (Garbage Collection) Exception
Failure to release an array reference with releaseArrayRawData after acquiring its pointer via acquireArrayRawData may cause Cangjie GC (Garbage Collection) exceptions.
During runtime shutdown, sampled arrays are checked for proper release via releaseArrayRawData. Unreleased arrays will report their heap addresses. Example:
unsafe {
let array = Array<UInt8>(4, item: 0)
let cp = acquireArrayRawData(array)
cp.pointer.read()
// Missing releaseArrayRawData
return
}
Corresponding error report:
2025-05-22 10:53:09.564761 1248614 F Unreleased array: 0x7fffd77f92d8
2025-05-22 10:53:09.564788 1248614 F Detect un-released array
Keywords
Keywords are special strings that cannot be used as identifiers. The keywords in the Cangjie language are listed in the following table:
| Keyword | Keyword | Keyword |
|---|---|---|
| as | abstract | break |
| Bool | case | catch |
| class | const | continue |
| Rune | do | else |
| enum | extend | for |
| func | false | finally |
| foreign | Float16 | Float32 |
| Float64 | if | in |
| is | init | import |
| interface | Int8 | Int16 |
| Int32 | Int64 | IntNative |
| let | mut | main |
| macro | match | Nothing |
| open | operator | override |
| prop | public | package |
| private | protected | quote |
| redef | return | spawn |
| super | static | struct |
| synchronized | try | this |
| true | type | throw |
| This | unsafe | Unit |
| UInt8 | UInt16 | UInt32 |
| UInt64 | UIntNative | var |
| VArray | where | while |
Operators
The following table lists all operators supported by Cangjie, along with their precedence and associativity. In the precedence column, a smaller numerical value indicates higher operator precedence.
| Operator | Precedence | Meaning | Example | Associativity |
|---|---|---|---|---|
@ | 0 | Macro invocation | @id | Right |
. | 1 | Member access | expr.id | Left |
[] | 1 | Indexing | expr[expr] | Left |
() | 1 | Function call | expr(expr) | Left |
++ | 2 | Increment | var++ | None |
-- | 2 | Decrement | var-- | None |
? | 2 | Question mark | expr?.id, expr?[expr], expr?(expr), expr?{expr} | None |
! | 3 | Bitwise NOT, Logical NOT | !expr | Right |
- | 3 | Unary minus | -expr | Right |
** | 4 | Exponentiation | expr ** expr | Right |
*, / | 5 | Multiplication, Division | expr * expr, expr / expr | Left |
% | 5 | Modulo | expr % expr | Left |
+, - | 6 | Addition, Subtraction | expr + expr, expr - expr | Left |
<< | 7 | Bitwise left shift | expr << expr | Left |
>> | 7 | Bitwise right shift | expr >> expr | Left |
.. | 8 | Left-closed right-open range | expr..expr | None |
..= | 8 | Closed range | expr..=expr | None |
< | 9 | Less than | expr < expr | None |
<= | 9 | Less than or equal | expr <= expr | None |
> | 9 | Greater than | expr > expr | None |
>= | 9 | Greater than or equal | expr >= expr | None |
is | 9 | Type check | expr is Type | None |
as | 9 | Type conversion | expr as Type | None |
== | 10 | Equality | expr == expr | None |
!= | 10 | Inequality | expr != expr | None |
& | 11 | Bitwise AND | expr & expr | Left |
^ | 12 | Bitwise XOR | expr ^ expr | Left |
| | 13 | Bitwise OR | expr | expr | Left |
&& | 14 | Logical AND | expr && expr | Left |
|| | 15 | Logical OR | expr || expr | Left |
?? | 16 | Coalescing operator | expr ?? expr | Right |
|> | 17 | Pipeline operator | id |> expr | Left |
~> | 17 | Composition operator | expr ~> expr | Left |
= | 18 | Assignment | id = expr | None |
**= | 18 | Compound operator | id **= expr | None |
*= | 18 | Compound operator | id *= expr | None |
/= | 18 | Compound operator | id /= expr | None |
%= | 18 | Compound operator | id %= expr | None |
+= | 18 | Compound operator | id += expr | None |
-= | 18 | Compound operator | id -= expr | None |
<<= | 18 | Compound operator | id <<= expr | None |
>>= | 18 | Compound operator | id >>= expr | None |
&= | 18 | Compound operator | id &= expr | None |
^= | 18 | Compound operator | id ^= expr | None |
|= | 18 | Compound operator | id |= expr | None |
&&= | 18 | Compound operator | id &&= expr | None |
||= | 18 | Compound operator | id ||= expr | None |
Operator Functions
The following table lists all operator functions supported by Cangjie.
| Operator Function | Function Signature | Example |
|---|---|---|
[] (Index Access) | operator func [](index1: T1, index2: T2, ...): R | this[index1, index2, ...] |
[] (Index Assignment) | operator func [](index1: T1, index2: T2, ..., value!: TN): R | this[index1, index2, ...] = value |
() | operator func ()(param1: T1, param2: T2, ...): R | this(param1, param2, ...) |
! | operator func !(): R | !this |
** | operator func **(other: T): R | this ** other |
* | operator func *(other: T): R | this * other |
/ | operator func /(other: T): R | this / other |
% | operator func %(other: T): R | this % other |
+ | operator func +(other: T): R | this + other |
- | operator func -(other: T): R | this - other |
<< | operator func <<(other: T): R | this << other |
>> | operator func >>(other: T): R | this >> other |
< | operator func <(other: T): R | this < other |
<= | operator func <=(other: T): R | this <= other |
> | operator func >(other: T): R | this > other |
>= | operator func >=(other: T): R | this >= other |
== | operator func ==(other: T): R | this == other |
!= | operator func !=(other: T): R | this != other |
& | operator func &(other: T): R | this & other |
^ | operator func ^(other: T): R | this ^ other |
| | operator func |(other: T): R | this | other |
TokenKind Type
public enum TokenKind <: ToString {
DOT| /* "." */
COMMA| /* "," */
LPAREN| /* "(" */
RPAREN| /* ")" */
LSQUARE| /* "[" */
RSQUARE| /* "]" */
LCURL| /* "{" */
RCURL| /* "}" */
EXP| /* "**" */
MUL| /* "*" */
MOD| /* "%" */
DIV| /* "/" */
ADD| /* "+" */
SUB| /* "-" */
INCR| /* "++" */
DECR| /* "--" */
AND| /* "&&" */
OR| /* "||" */
COALESCING| /* "??" */
PIPELINE| /* "|>" */
COMPOSITION| /* "~>" */
NOT| /* "!" */
BITAND| /* "&" */
BITOR| /* "|" */
BITXOR| /* "^" */
BITNOT| /* "~" */
LSHIFT| /* "<<" */
RSHIFT| /* ">>" */
COLON| /* ":" */
SEMI| /* ";" */
ASSIGN| /* "=" */
ADD_ASSIGN| /* "+=" */
SUB_ASSIGN| /* "-=" */
MUL_ASSIGN| /* "*=" */
EXP_ASSIGN| /* "**=" */
DIV_ASSIGN| /* "/=" */
MOD_ASSIGN| /* "%=" */
AND_ASSIGN| /* "&&=" */
OR_ASSIGN| /* "||=" */
BITAND_ASSIGN| /* "&=" */
BITOR_ASSIGN| /* "|=" */
BITXOR_ASSIGN| /* "^=" */
LSHIFT_ASSIGN| /* "<<=" */
RSHIFT_ASSIGN| /* ">>=" */
ARROW| /* "->" */
BACKARROW| /* "<-" */
DOUBLE_ARROW| /* "=>" */
RANGEOP| /* ".." */
CLOSEDRANGEOP| /* "..=" */
ELLIPSIS| /* "..." */
HASH| /* "#" */
AT| /* "@" */
QUEST| /* "?" */
LT| /* "<" */
GT| /* ">" */
LE| /* "<=" */
GE| /* ">=" */
IS| /* "is" */
AS| /* "as" */
NOTEQ| /* "!=" */
EQUAL| /* "==" */
WILDCARD| /* "_" */
INT8| /* "Int8" */
INT16| /* "Int16" */
INT32| /* "Int32" */
INT64| /* "Int64" */
INTNATIVE| /* "IntNative" */
UINT8| /* "UInt8" */
UINT16| /* "UInt16" */
UINT32| /* "UInt32" */
UINT64| /* "UInt64" */
UINTNATIVE| /* "UIntNative" */
FLOAT16| /* "Float16" */
FLOAT32| /* "Float32" */
FLOAT64| /* "Float64" */
RUNE| /* "Rune" */
BOOLEAN| /* "Bool" */
NOTHING| /* "Nothing" */
UNIT| /* "Unit" */
STRUCT| /* "struct" */
ENUM| /* "enum" */
VARRAY| /* "VArray" */
THISTYPE| /* "This" */
PACKAGE| /* "package" */
IMPORT| /* "import" */
CLASS| /* "class" */
INTERFACE| /* "interface" */
FUNC| /* "func" */
MACRO| /* "macro" */
QUOTE| /* "quote" */
DOLLAR| /* "$" */
LET| /* "let" */
VAR| /* "var" */
CONST| /* "const" */
TYPE| /* "type" */
INIT| /* "init" */
THIS| /* "this" */
SUPER| /* "super" */
IF| /* "if" */
ELSE| /* "else" */
CASE| /* "case" */
TRY| /* "try" */
CATCH| /* "catch" */
FINALLY| /* "finally" */
FOR| /* "for" */
DO| /* "do" */
WHILE| /* "while" */
THROW| /* "throw" */
RETURN| /* "return" */
CONTINUE| /* "continue" */
BREAK| /* "break" */
IN| /* "in" */
NOT_IN| /* "!in" */
MATCH| /* "match" */
WHERE| /* "where" */
EXTEND| /* "extend" */
WITH| /* "with" */
PROP| /* "prop" */
STATIC| /* "static" */
PUBLIC| /* "public" */
PRIVATE| /* "private" */
INTERNAL| /* "internal" */
PROTECTED| /* "protected" */
OVERRIDE| /* "override" */
REDEF| /* "redef" */
ABSTRACT| /* "abstract" */
SEALED| /* "sealed" */
OPEN| /* "open" */
FOREIGN| /* "foreign" */
INOUT| /* "inout" */
MUT| /* "mut" */
UNSAFE| /* "unsafe" */
OPERATOR| /* "operator" */
SPAWN| /* "spawn" */
SYNCHRONIZED| /* "synchronized */
UPPERBOUND| /* "<:" */
MAIN| /* "main" */
IDENTIFIER| /* "x" */
PACKAGE_IDENTIFIER| /* "x-y" */
INTEGER_LITERAL| /* e.g. "1" */
RUNE_BYTE_LITERAL| /* e.g. "b'x'" */
FLOAT_LITERAL| /* e.g. "'1.0'" */
COMMENT| /* e.g. "//xx" */
NL| /* newline */
END| /* end of file */
SENTINEL| /* ";" */
RUNE_LITERAL| /* e.g. "r'x'" */
STRING_LITERAL| /* e.g. ""xx"" */
SINGLE_QUOTED_STRING_LITERAL|
/* e.g. "'xx'" */
JSTRING_LITERAL| /* e.g. "J"xx"" */
MULTILINE_STRING| /* e.g. """"aaa"""" */
MULTILINE_RAW_STRING| /* e.g. "#"aaa"#" */
BOOL_LITERAL| /* "true" or "false" */
UNIT_LITERAL| /* "()" */
DOLLAR_IDENTIFIER| /* e.g. "$x" */
ANNOTATION| /* e.g. "@When" */
AT_EXCL| /* e.g. "@!" */
ILLEGAL|<!--Del-->
HANDLE| /* "handle" */
PERFORM| /* "perform" */
RESUME| /* "resume" */
THROWING| /* "throwing" */<!--DelEnd-->
...
}
Cangjie Package Compatibility Check
This chapter introduces the new feature Cangjie Package Compatibility Check introduced starting from version 0.59.4. During the process of loading Cangjie packages by the Cangjie runtime, binary compatibility checks are performed to help developers identify compatibility issues, though it cannot intercept all binary compatibility problems.
Note:
This new feature is only applicable to version 0.59.4 and later. If the runtime or standard library contains versions prior to 0.59.4, compatibility is not guaranteed, nor can normal operation be ensured.
Check Rules
Assume the version number of the Cangjie runtime is a.b.c, and the version number of the Cangjie package to be loaded is x.y.z. Compatibility is satisfied if any of the following conditions are met:
- When both
aandxare 0,a == x && b == y && c == z. - When both
aandxare not 0,a == x.
If the two versions are compatible, the subsequent package loading process continues. If the compatibility requirements are not met, the following two error scenarios may occur:
-
Scenario 1: If the loaded package is the Cangjie core package, the Cangjie runtime terminates execution, and the error message includes the Cangjie runtime version number and the core package version number.
F executable cangjie file libcangjie-std-core.so version 0.59.3 is not compatible with deployed cangjie runtime version 0.59.5 -
Scenario 2: If the loaded package is any package other than the Cangjie core package, the Cangjie runtime reports an error and throws an IncompatiblePackageException, with the error message including the Cangjie runtime version number and the loaded package version number.
E executable cangjie file liba.so version 0.59.5 is not compatible with deployed cangjie runtime version 0.59.3 An exception has occurred: IncompatiblePackageException: executable cangjie file liba.so version 0.59.5 is not compatible with deployed cangjie runtime version 0.59.3 at package_global_init(:0) at package_global_init(:0)
Tool Usage Guide
The Cangjie language provides developers with a comprehensive suite of command-line tools and language server tools, covering project management, command-line debugging, code formatting, and other functionalities. After successfully installing the Cangjie toolchain, you can use these tools according to the manual instructions. For installation instructions of the Cangjie toolchain, please refer to Installing the Cangjie Toolchain.
Project Management Tool
Overview
cjpm (Cangjie Project Manager) is the official project management tool for the Cangjie language, designed to manage and maintain the module system of Cangjie projects. It covers operations such as module initialization, dependency checking, and updates. It provides a unified compilation entry point, supporting incremental compilation, parallel compilation, and custom compilation commands.
Usage Instructions
Execute the cjpm -h command to view the usage instructions for the project management tool, as shown below.
Cangjie Project Manager
Usage:
cjpm [subcommand] [option]
Available subcommands:
init Init a new cangjie module
check Check the dependencies
update Update cjpm.lock
tree Display the package dependencies in the source code
build Compile the current module
run Compile and run an executable product
test Unittest a local package or module
bench Run benchmarks in a local package or module
clean Clean up the target directory
install Install a cangjie binary
uninstall Uninstall a cangjie binary
Available options:
-h, --help help for cjpm
-v, --version version for cjpm
Use "cjpm [subcommand] --help" for more information about a command.
Basic usage commands are as follows:
cjpm build --help
cjpm is the name of the main program, build is the currently executed available command, and --help is the available configuration option for the current command (configuration options typically have both long and short forms with the same effect).
Upon successful execution, the following result will be displayed:
Compile a local module and all of its dependencies.
Usage:
cjpm build [option]
Available options:
-h, --help help for build
-i, --incremental enable incremental compilation
-j, --jobs <N> the number of jobs to spawn in parallel during the build process
-V, --verbose enable verbose
-g enable compile debug version target
--coverage enable coverage
--cfg enable the customized option 'cfg'
--enable-features <value> explicitly specify comma-separated list of features to be enabled
--no-feature-deduce disables auto-enabling of features, deduced from other options, machine properties, etc.
-m, --member <value> specify a member module of the workspace
--target <value> generate code for the given target platform
--target-dir <value> specify target directory
-o, --output <value> specify product name when compiling an executable file
-l, --lint enable cjlint code check
--mock enable support of mocking classes in tests
--skip-script disable script 'build.cj'.
Note:
When using
cjpmcentral repository features, external dependencies are required. Please refer to thestdx.net.tlslibrary documentation and follow the instructions to install the necessary external dependencies.
Command Descriptions
init
init is used to initialize a new Cangjie module or workspace. When initializing a module, it creates a configuration file cjpm.toml in the current folder by default and creates a src source code folder. If the module’s output is of the executable type, it generates a default main.cj file under src, which prints hello world after compilation. When initializing a workspace, only the cjpm.toml file is created, and it scans existing Cangjie modules under the path and adds them to the members field by default. If cjpm.toml already exists or main.cj is already present in the source folder, the corresponding file creation steps will be skipped.
init has several configurable options:
--workspacecreates a new workspace configuration file. When this option is specified, other options are automatically ignored.--name <value>specifies therootpackage name of the new module. If not specified, it defaults to the name of the parent folder.--path <value>specifies the path for the new module. If not specified, it defaults to the current folder.--type=<executable|static|dynamic>specifies the output type of the new module. If omitted, it defaults toexecutable.--experimentalinitializes the Cangjie multi-platform project.
Examples:
Input: cjpm init
Output: cjpm init success
Input: cjpm init --name demo --path project
Output: cjpm init success
Input: cjpm init --type=static
Output: cjpm init success
check
The check command is used to check the dependencies required by the project. Upon successful execution, it prints the valid package compilation order.
check has several configurable options:
-m, --member <value>can only be used in a workspace to specify a single module as the check entry point.--no-testswhen configured, test-related dependencies will not be printed.--skip-scriptwhen configured, the build script compilation and execution will be skipped.
Examples:
Input: cjpm check
Output:
The valid serial compilation order is:
b.pkgA -> b
cjpm check success
Input: cjpm check
Output:
Error: cyclic dependency
b.B -> c.C
c.C -> d.D
d.D -> b.B
Note: In the above output, b.B represents a subpackage named b.B in the module with b as the root package.
Input: cjpm check
Output:
Error: can not find the following dependencies
pro1.xoo
pro1.yoo
pro2.zoo
update
update is used to update the contents of cjpm.toml to cjpm.lock. When cjpm.lock does not exist, it will be generated. The cjpm.lock file records metadata such as version numbers of git dependencies for use in the next build.
update has the following configurable option:
--skip-scriptwhen configured, the build script compilation and execution will be skipped.
Input: cjpm update
Output: cjpm update success
tree
The tree command is used to visually display the package dependency relationships in Cangjie source code.
tree has several configurable options:
-V, --verboseadds detailed information to package nodes, including package name, version number, and package path.--depth <N>specifies the maximum depth of the dependency tree. The value must be a non-negative integer. When this option is specified, all packages are treated as root nodes by default. The value of N represents the maximum depth of child nodes for each dependency tree.-p, --package <value>specifies a package as the root node to display its sub-dependencies. The value required is the package name.--invert <value>specifies a package as the root node and inverts the dependency tree to show which packages depend on it. The value required is the package name.--target <value>includes dependencies for the specified target platform in the analysis and displays the dependency relationships.--no-testsexcludes dependencies listed in thetest-dependenciesfield.--skip-scriptwhen configured, the build script compilation and execution will be skipped.
For example, the source code directory structure of module a is as follows:
src
├── main.cj
├── aoo
│ └── a.cj
├── boo
│ └── b.cj
├── coo
│ └── c.cj
├── doo
│ └── d.cj
└── eoo
└── e.cj
The dependency relationships are: package a imports packages a.aoo and a.boo; subpackage aoo imports package a.coo; subpackage boo imports package coo; subpackage doo imports package coo.
Input: cjpm tree
Output:
|-- a
└── a.aoo
└── a.coo
└── a.boo
└── a.coo
|-- a.doo
└── a.coo
|-- a.eoo
cjpm tree success
Input: cjpm tree --depth 2 -p a
Output:
|-- a
└── a.aoo
└── a.coo
└── a.boo
└── a.coo
cjpm tree success
Input: cjpm tree --depth 0
Output:
|-- a
|-- a.eoo
|-- a.aoo
|-- a.boo
|-- a.doo
|-- a.coo
cjpm tree success
Input: cjpm tree --invert a.coo --verbose
Output:
|-- a.coo 1.2.0 (.../src/coo)
└── a.aoo 1.1.0 (.../src/aoo)
└── a 1.0.0 (.../src)
└── a.boo 1.1.0 (.../src/boo)
└── a 1.0.0 (.../src)
└── a.doo 1.3.0 (.../src/doo)
cjpm tree success
build
build is used to build the current Cangjie project. Before executing this command, it checks dependencies. If the check passes, it calls cjc to perform the build.
build has several configurable options:
-i, --incrementalspecifies incremental compilation. By default, full compilation is performed.-j, --jobs <N>specifies the maximum number of parallel compilation jobs. The final maximum concurrency is the minimum ofNand2 times the number of CPU cores.-V, --verbosedisplays compilation logs.-ggeneratesdebugversion output.--coveragegenerates coverage information. By default, coverage is disabled.--cfgwhen specified, customcfgoptions incjpm.tomlcan be passed through. Forcjpm.tomlconfiguration, refer to the profile.customized-option section.--enable-features <value>Explicitly specifies the features to enable, which can be comma-separated.--no-feature-deduceDisables automatic feature enabling, including inference from other options or machine-related properties.-m, --member <value>can only be used in a workspace to specify a single module as the compilation entry point.--target <value>when specified, enables cross-compilation to the target platform. Forcjpm.tomlconfiguration, refer to the target section.--target-dir <value>specifies the output directory for the build artifacts.-o, --output <value>specifies the name of the output executable file. The default name ismain(main.exeon Windows). Note that compiling an executable namedcjcis currently not supported.-l, --lintUsed to invoke the Cangjie language static analysis tool for code inspection during compilation.--mockenablesmocktesting for classes in the build version with this option.--skip-scriptwhen configured, the build script compilation and execution will be skipped.
Note:
- The
-i, --incrementaloption only enables package-level incremental compilation incjpm. Developers can manually pass--incremental-compileand--experimentalcompilation options in the configuration file’scompile-optionfield to enable function-level incremental compilation provided by thecjccompiler.- The
-i, --incrementaloption currently only supports incremental analysis based on source code. If imported library content changes, developers need to rebuild using full compilation.- Source files ending with
_test.cjand test cases within ordinary source files are ignored during thebuildphase. Other compilation-related commands(run/install/bundle)also ignore the aforementioned files during the compilation phase.
Intermediate files generated during compilation are stored in the target folder by default, while executable files are stored in target/release/bin or target/debug/bin folders based on the compilation mode. To run the executable, refer to the run command.
To ensure reproducible builds, this command creates a cjpm.lock file containing the exact versions of all transitive dependencies, which will be used for subsequent builds. To update this file, use the update command. If reproducible builds are required for all project participants, this file should be committed to the version control system.
Examples:
Input: cjpm build -V
Output:
compile package module1.package1: cjc --import-path "target/release" --output-dir "target/release/module1" -p "src/package1" --output-type=staticlib -o libmodule1.package1.a
compile package module1: cjc --import-path "target/release" --output-dir "target/release/bin" -p "src" --output-type=exe -o main
cjpm build success
Input: cjpm build
Output: cjpm build success
Note:
According to the Cangjie package management specifications, only valid source code packages that meet the requirements can be correctly included in the compilation scope. If warnings like
no '.cj' fileappear during compilation, it is likely because the corresponding package does not meet the specifications, causing the source files to be excluded. In such cases, refer to the Cangjie Package Management Specifications to modify the code directory structure.
Before executing cjpm build, cjpm checks the package dependency relationships of the current module or workspace. If mutual imports between packages form a dependency cycle, the build will be aborted, and an error message will be returned, indicating the cyclic dependency path.
For example, the source code directory structure of module demo is as follows:
src
├── main.cj
├── aoo
│ └── a.cj
├── boo
│ └── b.cj
└── coo
└── c.cj
The dependency relationships are: package demo.aoo imports package demo.boo, package demo.boo imports package demo.coo, and package demo.coo imports package demo.aoo. The mutual imports between these three packages form a cycle, resulting in a cyclic dependency:
Input: cjpm build
Output:
cyclic dependency:
demo.boo -> demo.coo
demo.coo -> demo.aoo
demo.aoo -> demo.boo
Error: cjpm build failed
When a cyclic dependency occurs, developers can troubleshoot based on the error message. In the example above, the import chain is demo.aoo -> demo.boo -> demo.coo -> demo.aoo. Developers can start analyzing from each package’s directory to identify and remove unnecessary imports to resolve the cyclic dependency. For example, start with demo.aoo and check which source files import demo.boo. If these files do not functionally depend on demo.boo, the corresponding imports can be removed. Repeat this process for demo.boo and demo.coo to eliminate redundant imports and resolve the cyclic dependency.
If functional cyclic dependencies are confirmed, consider the following solutions:
- Refactor import order: Ensure dependencies are unidirectional to avoid cycles. For example, the part of
demo.coothat depends ondemo.aoocan be independently implemented to break the cycle. - Split modules: Move interdependent code into a separate package. For example, merge the three subpackages into one.
In other commands involving package dependency resolution (e.g., tree), similar errors will appear for cyclic dependencies, and the same solutions can be applied.
run
run is used to execute the binary output of the current project. The run command implicitly executes the build command to generate the final binary for execution.
run has several configurable options:
--name <value>specifies the name of the binary to run. If not specified, it defaults tomain. In a workspace, binaries are stored intarget/release/binby default.--build-args <value>controls the parameters for thebuildcompilation process.--skip-buildskips the compilation process and directly runs the binary.--run-args <value>passes arguments to the binary being executed.--target-dir <value>specifies the output directory for the executable.-gruns thedebugversion of the binary.-V, --verbosedisplays execution logs.--skip-scriptwhen configured, the build script compilation and execution will be skipped.
Examples:
Input: cjpm run
Output: cjpm run success
Input: cjpm run -g // This implicitly executes cjpm build -i -g
Output: cjpm run success
Input: cjpm run --build-args="-s -j16" --run-args="a b c"
Output: cjpm run success
test
test is used to compile and run unit test cases for Cangjie files, printing the test results upon completion. The compiled output is stored in target/release/unittest_bin by default. For writing unit test cases, refer to the std.unittest library documentation in the Cangjie Programming Language Standard Library API.
This command can specify the path of a single package to test (multiple packages can be specified, e.g., cjpm test path1 path2). If no path is specified, module-level unit tests are executed by default. During module-level unit testing, only the current module’s tests are executed; tests in directly or indirectly dependent modules are not run. The test command requires the current project to compile successfully with build.
The unit test code structure for a module is as follows, where xxx.cj contains the package’s source code and xxx_test.cj contains the unit test code:
└── src
│ ├── koo
│ │ ├── koo.cj
│ │ └── koo_test.cj
│ ├── zoo
│ │ ├── zoo.cj
│ │ └── zoo_test.cj
│ ├── main.cj
│ └── main_test.cj
└── cjpm.toml
-
Single-module testing scenario:
Input: cjpm test Progress report: group test.koo 100% [||||||||||||||||||||||||||||] ✓ (00:00:01) group test.zoo 0% [----------------------------] (00:00:00) test TestZ.sayhi (00:00:00) passed: 1, failed: 0 33% [|||||||||-------------------] 1/3 (00:00:01) Output: -------------------------------------------------------------------------------------------------- TP: test, time elapsed: 177921 ns, RESULT: TCS: TestM, time elapsed: 177921 ns, RESULT: [ PASSED ] CASE: sayhi (177921 ns) Summary: TOTAL: 1 PASSED: 1, SKIPPED: 0, ERROR: 0 FAILED: 0 -------------------------------------------------------------------------------------------------- TP: test.koo, time elapsed: 134904 ns, RESULT: TCS: TestK, time elapsed: 134904 ns, RESULT: [ PASSED ] CASE: sayhi (134904 ns) Summary: TOTAL: 1 PASSED: 1, SKIPPED: 0, ERROR: 0 FAILED: 0 -------------------------------------------------------------------------------------------------- TP: test.zoo, time elapsed: 132013 ns, RESULT: TCS: TestZ, time elapsed: 132013 ns, RESULT: [ PASSED ] CASE: sayhi (132013 ns) Summary: TOTAL: 1 PASSED: 1, SKIPPED: 0, ERROR: 0 FAILED: 0 -------------------------------------------------------------------------------------------------- Project tests finished, time elapsed: 444838 ns, RESULT: TP: test.*, time elapsed: 312825 ns, RESULT: PASSED: TP: test.zoo, time elapsed: 132013 ns TP: test.koo, time elapsed: 312825 ns TP: test, time elapsed: 312825 ns Summary: TOTAL: 3 PASSED: 3, SKIPPED: 0, ERROR: 0 FAILED: 0 -------------------------------------------------------------------------------------------------- cjpm test success -
Single Package Test Scenario
Input: cjpm test src/koo Output: -------------------------------------------------------------------------------------------------- TP: test.koo, time elapsed: 160133 ns, RESULT: TCS: TestK, time elapsed: 160133 ns, RESULT: [ PASSED ] CASE: sayhi (160133 ns) Summary: TOTAL: 1 PASSED: 1, SKIPPED: 0, ERROR: 0 FAILED: 0 -------------------------------------------------------------------------------------------------- Project tests finished, time elapsed: 160133 ns, RESULT: TP: test.*, time elapsed: 160133 ns, RESULT: PASSED: TP: test.koo, time elapsed: 160133 ns Summary: TOTAL: 1 PASSED: 1, SKIPPED: 0, ERROR: 0 FAILED: 0 -------------------------------------------------------------------------------------------------- cjpm test success -
Multi-Package Test Scenario
Input: cjpm test src/koo src Output: -------------------------------------------------------------------------------------------------- TP: test.koo, time elapsed: 168204 ns, RESULT: TCS: TestK, time elapsed: 168204 ns, RESULT: [ PASSED ] CASE: sayhi (168204 ns) Summary: TOTAL: 1 PASSED: 1, SKIPPED: 0, ERROR: 0 FAILED: 0 -------------------------------------------------------------------------------------------------- TP: test, time elapsed: 171541 ns, RESULT: TCS: TestM, time elapsed: 171541 ns, RESULT: [ PASSED ] CASE: sayhi (171541 ns) Summary: TOTAL: 1 PASSED: 1, SKIPPED: 0, ERROR: 0 FAILED: 0 -------------------------------------------------------------------------------------------------- Project tests finished, time elapsed: 339745 ns, RESULT: TP: test.*, time elapsed: 339745 ns, RESULT: PASSED: TP: test.koo, time elapsed: 339745 ns TP: test, time elapsed: 339745 ns Summary: TOTAL: 2 PASSED: 2, SKIPPED: 0, ERROR: 0 FAILED: 0 -------------------------------------------------------------------------------------------------- cjpm test success
test has multiple configurable options:
-j, --jobs <N>Specifies the maximum number of parallel compilations. The final maximum concurrency is the minimum ofNand2 times the number of CPU cores.-V, --verboseWhen enabled, outputs unit test logs.-gGenerates adebugversion of the unit test artifacts, which are stored in thetarget/debug/unittest_bindirectory.-i, --incrementalSpecifies incremental compilation of test code. Full compilation is performed by default.--no-runCompiles only the unit test artifacts without execution.--skip-buildExecutes only the pre-built unit test artifacts.--coverageGenerates raw coverage data. When usingcjpm test --coverage, themainfunction in the source code will not be executed as the program entry point and will appear as uncovered. It is recommended to avoid writing redundantmainfunctions after usingcjpm test.--cfgWhen specified, customcfgoptions fromcjpm.tomlare passed through.--module <value>Specifies the target test module, which must be directly or indirectly dependent on the current module (or be the module itself). Multiple modules can be specified using--module "module1 module2". If not specified, only the current module is tested by default.-m, --member <value>Can only be used in a workspace to specify testing a single module.--target <value>When specified, cross-compiles unit test artifacts for the target platform. Refer to the target section forcjpm.tomlconfigurations.--target-dir <value>Specifies the output directory for unit test artifacts.--dry-runWhen enabled, only prints the test cases without execution.--filter <value>Filters a subset of tests. Thevalueformat is as follows:--filter=*Matches all test classes.--filter=*.*Matches all test cases of all test classes (same result as*).--filter=*.*Test,*.*case*Matches all test cases ending withTestin any test class, or all test cases containingcasein their names.--filter=MyTest*.*Test,*.*case*,-*.*myTestMatches all test cases in classes starting withMyTestand ending withTest, or containingcasein their names, or excluding those containingmyTest.
--include-tags <value>Runs a subset of tests marked with the@Tagmacro. Thevalueformat is as follows:--include-tags=UnittestRuns all tests marked with@Tag[Unittest].--include-tags=Unittest,SmokeRuns all tests marked with either@Tag[Unittest]or@Tag[Smoke](or both).--include-tags=Unittest+SmokeRuns all tests marked with both@Tag[Unittest]and@Tag[Smoke].--include-tags=Unittest+Smoke+JiraTask3271,BackendRuns all tests marked with@Tag[Backend]or any of@Tag[Unittest, Smoke, JiraTask3271].
--exclude-tags <value>Excludes a subset of tests marked with the@Tagmacro. Thevalueformat is as follows:--exclude-tags=UnittestRuns all tests not marked with@Tag[Unittest].--exclude-tags=Unittest,SmokeRuns all tests not marked with either@Tag[Unittest]or@Tag[Smoke](or both).--exclude-tags=Unittest+Smoke+JiraTask3271Runs all tests not marked with all of@Tag[Unittest, Smoke, JiraTask3271].--include-tags=Unittest --exclude-tags=SmokeRuns all tests marked with@Tag[Unittest]but not@Tag[Smoke].
--no-colorDisables colored console output.--random-seed <N>Specifies the value of the random seed.--timeout-each <value>The format is%d[millis|s|m|h], specifying the default timeout for each test case.--parallelSpecifies the parallel execution scheme for test cases. Thevalueformat is as follows:<BOOL>Can betrueorfalse. Whentrue, test classes can run in parallel, with the number of parallel processes controlled by the available CPU cores.nCoresSpecifies that the number of parallel test processes equals the available CPU cores.NUMBERSpecifies a positive integer for the number of parallel test processes.NUMBERnCoresSpecifies that the number of parallel test processes is a multiple of the available CPU cores. The value must be positive (supports floating-point or integer values).
--show-tagsDisplays@Taginformation in the test report. In--dry-runmode withxmlformat reports,Taginformation is always included.--show-all-outputEnables output printing for all test cases, including passed ones.--no-capture-outputDisables test output capture, printing output immediately during test execution.--report-path <value>Specifies the path for generating test execution reports.--report-format <value>Specifies the report output format. Currently, onlyxmlandxml-per-packageformat (case-insensitive) is supported for unit test reports. Other values will throw an exception.--skip-scriptSkips the compilation and execution of build scripts.--no-progressDisables progress reporting. Implicitly enabled if--dry-runis specified.--progress-briefDisplays a brief (single-line) progress report instead of a detailed one.--progress-entries-limit <value>Limits the number of entries displayed in the progress report. Default:0. Allowed values:0No limit on the number of entries.nWherenis a positive integer, specifying the maximum number of entries displayed simultaneously in the terminal.
Examples of cjpm test parameter usage:
Input:
cjpm test src --coverage
cjcov --root=./ --html-details -o html_output
Output: cjpm test success
Coverage generation: HTML files are generated in the html_output directory, with the main coverage report file named index.html.
Input: cjpm test --filter=*
Output: cjpm test success
Input: cjpm test src --report-path=reports --report-format=xml
Output: cjpm test success
Note:
cjpm testautomatically builds all packages withmocksupport, allowing developers to performmocktests on custom classes or classes from dependent modules. To enablemockfor classes from binary dependencies, build withmocksupport usingcjpm build --mock. Avoid writing redundantmainfunctions after usingcjpm test.
bench
The bench command is used to execute performance test cases in test files and directly print the test results. The compiled artifacts are stored by default in the target/release/unittest_bin directory. Performance test cases are annotated with the @Bench macro. For more details on how to write performance test code, refer to the description of the std.unittest library in the Cangjie Programming Language Standard Library API.
This command can specify the path of a single package to test (multiple packages can be specified, e.g., cjpm bench path1 path2). If no path is specified, module-level unit tests are executed by default. Similar to test, when executing module-level unit tests, only the current module’s unit tests are performed by default. The bench command requires that the current project can be successfully compiled with build.
Like the test subcommand, if you have an xxx.cj file, xxx_test.cj can also contain performance test cases.
Input: cjpm bench
Output:
TP: bench, time elapsed: 8107939844 ns, RESULT:
TCS: Test_UT, time elapsed: 8107939844 ns, RESULT:
| Case | Median | Err | Err% | Mean |
|:-----------|---------:|------------:|-------:|---------:|
| Benchmark1 | 5.438 ns | ±0.00439 ns | ±0.1% | 5.420 ns |
Summary: TOTAL: 1
PASSED: 1, SKIPPED: 0, ERROR: 0
FAILED: 0
--------------------------------------------------------------------------------------------------
Project tests finished, time elapsed: 8107939844 ns, RESULT:
TP: bench.*, time elapsed: 8107939844 ns, RESULT:
PASSED:
TP: bench, time elapsed: 8107939844 ns, RESULT:
Summary: TOTAL: 1
PASSED: 1, SKIPPED: 0, ERROR: 0
FAILED: 0
bench has several configurable options:
-j, --jobs <N>: Specifies the maximum number of parallel compilation jobs. The final maximum concurrency is the minimum ofNand2 × CPU cores.-V, --verbose: When enabled, outputs unit test logs.-g: Generates adebugversion of the unit test artifacts, which are stored in thetarget/debug/unittest_bindirectory.-i, --incremental: Specifies incremental compilation for test code. Full compilation is performed by default.--no-run: Only compiles unit test artifacts without executing them.--skip-build: Only executes unit test artifacts without compiling them.--cfg: Allows passing customcfgoptions fromcjpm.toml.--module <value>: Specifies the target test module. The specified module must be directly or indirectly dependent on the current module (or be the module itself). Multiple modules can be specified using--module "module1 module2". If not specified, only the current module is tested.-m, --member <value>: Only usable in a workspace, specifies a single module to test.--target <value>: Specifies cross-compilation for the target platform. Refer to thecross-compile-configurationsection incjpm.tomlfor configuration details.--target-dir <value>: Specifies the output directory for unit test artifacts.--dry-run: Prints the test cases without executing them.--filter <value>: Filters a subset of tests. Thevalueformat is as follows:--filter=*: Matches all test classes.--filter=*.*: Matches all test cases in all test classes (same result as*).--filter=*.*Test,*.*case*: Matches all test cases ending withTestor containingcasein their names.--filter=MyTest*.*Test,*.*case*,-*.*myTest: Matches all test cases in classes starting withMyTestand ending withTest, or containingcasein their names, but excludes those containingmyTest.
--include-tags <value>: Runs tests annotated with the specified@Tagmacro subsets. Thevalueformat is as follows:--include-tags=Unittest: Runs all tests marked with@Tag[Unittest].--include-tags=Unittest,Smoke: Runs all tests marked with either@Tag[Unittest]or@Tag[Smoke](or both).--include-tags=Unittest+Smoke: Runs all tests marked with both@Tag[Unittest, Smoke].--include-tags=Unittest+Smoke+JiraTask3271,Backend: Runs all tests marked with@Tag[Backend]or@Tag[Unittest, Smoke, JiraTask3271].
--exclude-tags <value>: Excludes tests annotated with the specified@Tagmacro subsets. Thevalueformat is as follows:--exclude-tags=Unittest: Runs all tests not marked with@Tag[Unittest].--exclude-tags=Unittest,Smoke: Runs all tests not marked with either@Tag[Unittest]or@Tag[Smoke](or both).--exclude-tags=Unittest+Smoke+JiraTask3271: Runs all tests not marked with@Tag[Unittest, Smoke, JiraTask3271].--include-tags=Unittest --exclude-tags=Smoke: Runs all tests marked with@Tag[Unittest]but not@Tag[Smoke].
--no-color: Disables colored console output.--show-tagsis used to display@Taginformation from test cases in the test report. In--dry-runmode with the test report inxmlformat,Taginformation will always be included.--random-seed <N>: Specifies the random seed value.--report-path <value>: Specifies the path for the generated report. Unlike thetestsubcommand, it defaults tobench_report.--report-format <value>: Performance test reports only supportcsvandcsv-rawformats.--baseline-path <value>: Path to an existing report for comparison with current performance results. By default, it uses the--report-pathvalue.--skip-script: Skips the compilation and execution of build scripts.
Example usage of cjpm bench options:
Input: cjpm bench
Output: cjpm bench success
Input: cjpm bench src
Output: cjpm bench success
Input: cjpm bench src --filter=*
Output: cjpm bench success
Input: cjpm bench src --report-format=csv
Output: cjpm bench success
Note:
cjpm benchdoes not fully supportmockto avoid any overhead frommockprocessing in the compiler during benchmarking. When usingcjpm benchoptions, the compiler will not report errors ifmockis used, allowing regular tests and benchmarks to be compiled together. However, avoid running benchmarks that usemock, as this will throw a runtime exception.
clean
The clean command removes temporary build artifacts (the target directory). It supports the short option -g to clean only debug artifacts and the long option --target-dir <value> to specify the directory to clean. Developers must ensure the safety of cleaning the specified directory. If cjpm build --coverage or cjpm test --coverage was used, it also removes the cov_output directory and *.gcno/*.gcda files in the current directory. The --skip-script option skips the compilation and execution of build scripts.
Examples:
Input: cjpm clean
Output: cjpm clean success
Input: cjpm clean --target-dir temp
Output: cjpm clean success
Note:
On Windows, cleaning executable files or parent directories immediately after subprocess execution may fail. If this occurs, retry the
cleancommand after a short delay.
install
The install command installs a Cangjie project. It first compiles the project and then installs the artifacts to the specified path, naming them after the project (with .exe suffix on Windows). The installed project must be of type executable.
install has several configurable options:
-V, --verbose: Shows installation logs.-m, --member <value>: Only usable in a workspace, specifies a single module to install.-g: Generates adebugversion of the installation artifacts.--path <value>: Specifies the local project path to install. Defaults to the current directory.--root <value>: Specifies the installation path for executables. Defaults to$HOME/.cjpmon Linux/macOS and%USERPROFILE%/.cjpmon Windows.--git <value>: Specifies the Git URL of the project to install.--branch <value>: Specifies the Git branch to install.--tag <value>: Specifies the Git tag to install.--commit <value>: Specifies the Git commit ID to install.-j, --jobs <N>: Specifies the maximum number of parallel compilation jobs. The final maximum concurrency is the minimum ofNand2 × CPU cores.--cfg: Allows passing customcfgoptions fromcjpm.toml.--target-dir <value>: Specifies the output directory for compilation artifacts.--name <value>: Specifies the name of the installed artifact.--skip-build: Skips the compilation phase and directly installs artifacts. Requires the project to be already compiled and only works for local installations.--list: Prints the list of installed artifacts.--skip-script: Skips the compilation and execution of build scripts for the module to install.
Notes on install:
- Two installation methods: local project (via
--path) and Git project (via--git). Only one can be configured; otherwise,installwill error. If neither is specified, the current directory’s local project is installed by default. - Incremental compilation is enabled by default.
- Git-related options (
--branch,--tag,--commit) are ignored unless--gitis specified. Priority:--commit>--branch>--tag. - Existing executables with the same name will be replaced.
- Executables are installed to
root/bin, whererootis the specified or default installation path. - Dynamic library dependencies are installed to
root/libs, organized by program name. Add the corresponding directory toLD_LIBRARY_PATH(Linux),PATH(Windows), orDYLD_LIBRARY_PATH(macOS) for usage. - The default installation path is added to
PATHduringenvsetup. - Git project installation removes the compilation artifacts directory afterward.
- If the project has only one executable artifact,
--namerenames it during installation. For multiple artifacts,--nameinstalls only the specified artifact. --listprints installed artifacts, ignoring all options except--root. With--root, it prints artifacts in the specified path; otherwise, it uses the default path.
Examples:
cjpm install --path path/to/project # Installs from a local path
cjpm install --git url # Installs from a Git URL
uninstall
The uninstall command removes a Cangjie project, deleting its executables and dependency files.
uninstall requires the name parameter to specify the artifact to uninstall. Multiple names can be specified for sequential removal. The --root <value> option specifies the installation path to uninstall (defaults to $HOME/.cjpm on Linux/macOS and %USERPROFILE%/.cjpm on Windows). Artifacts in root/bin and dependencies in root/libs are removed.
Note:
cjpmdoes not support Chinese paths onWindows. If issues arise, modify the directory name;cjpmdoes not support paths containing\onLinux/macOS. If issues arise, modify the directory name.
Module Configuration File Description
The cjpm.toml file configures basic information, dependencies, compilation options, etc. cjpm primarily uses this file for parsing and execution. Module names can be renamed in cjpm.toml, but package names cannot.
Example configuration:
[package] # Single-module configuration (cannot coexist with workspace)
cjc-version = "1.0.0" # Minimum required `cjc` version (required)
name = "demo" # Module name and root package name (required)
description = "nothing here" # Description (optional)
version = "1.0.0" # Module version (required)
compile-option = "" # Additional compilation options (optional)
override-compile-option = "" # Additional global compilation options (optional)
link-option = "" # Linker passthrough options (optional)
output-type = "executable" # Output type (required)
src-dir = "" # Source directory (optional)
target-dir = "" # Output directory (optional)
package-configuration = {} # Per-package configuration (optional)
[workspace] # Workspace configuration (cannot coexist with package)
members = [] # Workspace member modules (required)
build-members = [] # Modules to build (subset of members, optional)
test-members = [] # Modules to test (subset of build-members, optional)
compile-option = "" # Workspace-wide compilation options (optional)
override-compile-option = "" # Workspace-wide global compilation options (optional)
link-option = "" # Workspace-wide linker options (optional)
target-dir = "" # Output directory (optional)
[dependencies] # Source dependencies (optional)
coo = { git = "xxx", branch = "dev" } # Git dependency
doo = { path = "./pro1" } # Local source dependency
[test-dependencies] # Test-phase dependencies (same format as dependencies, optional)
[script-dependencies] # Build script dependencies (same format as dependencies, optional)
[replace] # Dependency replacement (same format as dependencies, optional)
[ffi.c] # C library dependencies (optional)
clib1.path = "xxx"
[profile] # Command profile configuration (optional)
build = {} # Build command options
test = {} # Test command options
bench = {} # Bench command options
customized-option = {} # Custom passthrough options
[target.x86_64-unknown-linux-gnu] # Platform-specific configuration (optional)
compile-option = "value1" # Compilation options for specific targets
override-compile-option = "value2" # Global compilation options for specific targets
link-option = "value3" # Linker options for specific targets
[target.x86_64-w64-mingw32.dependencies] # Platform-specific source dependencies (optional)
[target.x86_64-w64-mingw32.test-dependencies] # Platform-specific test dependencies (optional)
[target.x86_64-unknown-linux-gnu.bin-dependencies] # Binary library dependencies for specific targets (optional)
path-option = ["./test/pro0", "./test/pro1"] # Directory-based binary dependencies
[target.x86_64-unknown-linux-gnu.bin-dependencies.package-option] # File-based binary dependencies
"pro0.xoo" = "./test/pro0/pro0.xoo.cjo"
"pro0.yoo" = "./test/pro0/pro0.yoo.cjo"
"pro1.zoo" = "./test/pro1/pro1.zoo.cjo"
For detailed information about CJO files, see CJO Artifacts.
Unused fields default to empty (for paths, the default is the directory containing the configuration file).
“cjc-version”
Minimum required Cangjie compiler version. Must be compatible with the current environment. A valid version number consists of three natural numbers separated by ., with no leading zeros. Examples:
1.0.0: Valid.1.00.0: Invalid (leading zero in00).1.2e.0: Invalid (2eis not a natural number).
“name”
Current Cangjie module name, also the root package name. Must be a valid identifier (letters, numbers, underscores, starting with a letter). Examples: cjDemo, cj_demo_1.
Note:
Unicode characters are not supported. Module names must be ASCII-only identifiers.
“description”
Module description (free-form text).
“version”
Module version number, managed by the module owner. Format is the same as cjc-version.
“compile-option”
Additional compilation options passed to cjc. In multi-module builds, each module’s compile-option applies to all its packages.
Example:
compile-option = "-O1 -V"
Commands are inserted into the compilation command during build. Multiple commands can be separated by spaces. Refer to the Cangjie Programming Language Development Guide for available options.
“override-compile-option”
Additional global compilation options passed to cjc. In multi-module builds, the entry module’s override-compile-option applies to all dependent modules’ packages.
Example:
override-compile-option = "-O1 -V"
Commands are appended after compile-option and take higher precedence. Refer to the Cangjie Programming Language Development Guide for available options.
Note:
override-compile-optionaffects dependent modules. Ensure no conflicts with theircompile-option.- In workspaces, only the
workspace’soverride-compile-optionapplies to all modules.
“link-option”
Linker passthrough options, e.g., for secure compilation:
link-option = "-z noexecstack -z relro -z now --strip-all"
Note:
Only applies to dynamic libraries and executables.
“output-type”
Output artifact type: executable, static (static library), or dynamic (dynamic library). Defaults to executable for cjpm init.
| Input | Description |
|---|---|
| “executable” | Executable program |
| “static” | Static library |
| “dynamic” | Dynamic library |
| Others | Error report |
“src-dir”
This field can specify the source code storage path. If not specified, it defaults to the src directory.
“target-dir”
This field can specify the output directory for compiled artifacts. If not specified, it defaults to the target directory. If this field is not empty, executing cjpm clean will delete the directory specified by this field. Developers must ensure the safety of clearing this directory.
Note:
If the
--target-diroption is specified during compilation, this option will take higher precedence.
target-dir = "temp"
“package-configuration”
Per-package configuration options for individual modules. This option is a map structure, where the package name to be configured serves as the key, and the package-specific configuration serves as the value. Currently configurable options include output type and conditional options (output-type, compile-option). These options can be omitted and configured as needed. As shown below, the output type of the demo.aoo package in the demo module will be specified as a dynamic library, and the -g command will be passed through to the demo.aoo package during compilation.
[package.package-configuration."demo.aoo"]
output-type = "dynamic"
compile-option = "-g"
If mutually compatible compilation options are configured in different fields, the priority of the generated commands is as follows.
[package]
compile-option = "-O1"
[package.package-configuration.demo]
compile-option = "-O2"
# The profile field will be introduced later
[profile.customized-option]
cfg1 = "-O0"
Input: cjpm build --cfg1 -V
Output: cjc --import-path build -O0 -O1 -O2 ...
By configuring this field, multiple binary artifacts can be generated simultaneously (when generating multiple binary artifacts, the -o, --output <value> option will be invalid). Example:
Example of source code structure, with the module named demo:
src
├── aoo
│ └── aoo.cj
├── boo
│ └── boo.cj
├── coo
│ └── coo.cj
└── main.cj
Example of configuration:
[package.package-configuration."demo.aoo"]
output-type = "executable"
[package.package-configuration."demo.boo"]
output-type = "executable"
Example of multiple binary artifacts:
Input: cjpm build
Output: cjpm build success
Input: tree target/release/bin
Output: target/release/bin
|-- demo.aoo
|-- demo.boo
`-- demo
“workspace”
This field can manage multiple modules as a workspace, supporting the following configuration items:
members = ["aoo", "path/to/boo"]: Lists local source code modules included in this workspace, supporting absolute and relative paths. Members of this field must be modules and cannot be another workspace.build-members = []: Modules to be compiled this time. If not specified, all modules in the workspace are compiled by default. Members of this field must be included in themembersfield.test-members = []: Modules to be tested this time. If not specified, unit tests are run on all modules in the workspace by default. Members of this field must be included in thebuild-membersfield.compile-option = "": Public compilation options for the workspace (optional).override-compile-option = "": Public global compilation options for the workspace (optional).link-option = "": Public linking options for the workspace (optional).target-dir = "": Output directory for the workspace (optional, defaults totarget).
Public configuration items in the workspace apply to all member modules. For example: If a source dependency like [dependencies] xoo = { path = "path_xoo" } is configured, all member modules can directly use the xoo module without needing to configure it in each submodule’s cjpm.toml.
Note:
The
packagefield is used to configure general module information and cannot coexist with theworkspacefield in the samecjpm.toml. All other fields exceptpackagecan be used in the workspace.
Example of a workspace directory structure:
root_path
├── aoo
│ ├── src
│ └── cjpm.toml
├── boo
│ ├── src
│ └── cjpm.toml
├── coo
│ ├── src
│ └── cjpm.toml
└── cjpm.toml
Example of workspace configuration file usage:
[workspace]
members = ["aoo", "boo", "coo"]
build-members = ["aoo", "boo"]
test-members = ["aoo"]
compile-option = "-Woff all"
override-compile-option = "-O2"
[dependencies]
xoo = { path = "path_xoo" }
[ffi.c]
abc = { path = "libs" }
“dependencies”
This field imports other Cangjie modules as dependencies via source code, containing information about other modules required for the current build. Currently, it supports both local path dependencies and remote git dependencies.
To specify a local dependency, use the path field, which must contain a valid local path. For example, the code structure of the two submodules pro0 and pro1 and the main module is as follows:
├── pro0
│ ├── cjpm.toml
│ └── src
│ └── zoo
│ └── zoo.cj
├── pro1
│ ├── cjpm.toml
│ └── src
│ ├── xoo
│ │ └── xoo.cj
│ └── yoo
│ └── yoo.cj
├── cjpm.toml
└── src
├── aoo
│ └── aoo.cj
├── boo
│ └── boo.cj
└── main.cj
After configuring the main module’s cjpm.toml as follows, the pro0 and pro1 modules can be used in the source code:
[dependencies]
pro0 = { path = "./pro0" }
pro1 = { path = "./pro1" }
To specify a remote git dependency, use the git field, which must contain a valid url in any format supported by git. To configure a git dependency, at most one branch, tag, or commitId field can be included to select a specific branch, tag, or commit hash, respectively. If multiple such fields are configured, only the highest-priority configuration will take effect, with the priority order being commitId > branch > tag. For example, after configuring as follows, the pro0 and pro1 modules from the specified git repository can be used in the source code:
[dependencies]
pro0 = { git = "https://github.com/example", tag = "v1.0.0"}
pro1 = { git = "https://gitee.com/example", branch = "dev"}
In this case, cjpm will download the latest version of the corresponding repository and save the current commit-hash in the cjpm.lock file. All subsequent cjpm calls will use the saved version until cjpm update is executed.
Authentication is often required to access git repositories. cjpm does not request credentials, so existing git authentication support should be used. If the protocol for git is https, an existing git credential helper must be used. On Windows, the credential helper is installed by default with git. On Linux/macOS, refer to the git-config documentation in the official git documentation for details on setting up a credential helper. If the protocol is ssh or git, key-based authentication should be used. If the key is protected by a passphrase, the developer must ensure that ssh-agent is running and the key is added via ssh-add before using cjpm.
The dependencies field can specify the compilation output type via the output-type attribute. The specified type can differ from the compilation output type of the source dependency itself and can only be static or dynamic, as shown below:
[dependencies]
pro0 = { path = "./pro0", output-type = "static" }
pro1 = { git = "https://gitee.com/example", output-type = "dynamic" }
After the above configuration, the output-type settings in the cjpm.toml files of pro0 and pro1 will be ignored, and the two modules’ outputs will be compiled into static and dynamic types, respectively.
“test-dependencies”
This field has the same format as the dependencies field. It is used to specify dependencies that are only used during testing and not required for building the main project. Module developers should use this field for dependencies that downstream users of this module do not need to be aware of.
Dependencies within test-dependencies can only be used in test files named like xxx_test.cj. During compilation, these dependencies will not be compiled. The configuration format of test-dependencies in cjpm.toml is the same as that of dependencies.
“script-dependencies”
This field has the same format as the dependencies field. It is used to specify dependencies that are only used during build script compilation and not required for building the main project. Build script-related features will be detailed in the Other-Build Scripts section.
“replace”
This field has the same format as the dependencies field. It is used to specify replacements for indirect dependencies with the same name. The configured dependencies will be the final versions used when compiling the module.
For example, the module aaa depends on a local module bbb:
[package]
name = "aaa"
[dependencies]
bbb = { path = "path/to/bbb" }
When the main module demo depends on aaa, bbb becomes an indirect dependency of demo. In this case, if the main module demo wants to replace bbb with another module of the same name (e.g., the bbb module under another path new/path/to/bbb), it can be configured as follows:
[package]
name = "demo"
[dependencies]
aaa = { path = "path/to/aaa" }
[replace]
bbb = { path = "new/path/to/bbb" }
After configuration, the actual indirect dependency bbb used when compiling the demo module will be the bbb module under new/path/to/bbb, and the bbb module under path/to/bbb configured in aaa will not be compiled.
Note:
Only the
replacefield of the entry module takes effect during compilation.
“ffi.c”
This field configures external C library dependencies for the current Cangjie module. It contains the information required to depend on the library, including the library name and path.
Developers need to compile the dynamic or static library themselves and place it under the specified path. Refer to the example below.
Instructions for calling external C dynamic libraries in Cangjie:
- Compile the corresponding
hello.cfile into a.solibrary (executeclang -shared -fPIC hello.c -o libhello.soin the file path). - Modify the project’s
cjpm.tomlfile to configure theffi.cfield, as shown in the example below. Here,./src/is the relative path of the compiledlibhello.soto the current directory, andhellois the library name. - Execute
cjpm buildto compile successfully.
[ffi.c]
hello = { path = "./src/" }
To specify C library configurations for different platforms, refer to target.
Note:
In multi-module scenarios on
Windowssystems, if multiple modules configure libraries with the same namec, due to the unique library loading strategy ofWindows, the system will prioritize loading library files from the runtime directory first. As a result, the actual library file used may differ from those on other systems.
“profile”
profile is a command profile configuration item used to control default settings during command execution. Currently, the following scenarios are supported: build, test, bench, run, and customized-option.
“profile.build”
[profile.build]
lto = "full" # Whether to enable `LTO` (Link Time Optimization) compilation mode. This feature is only supported on target platforms of `Linux/OpenHarmony`.
performance_analysis = true # Enable compilation performance analysis.
incremental = true # Whether to enable incremental compilation by default.
[profile.build.combined]
demo = "dynamic" # Compile the module into a single dynamic library file. The key is the module name.
Compilation process control items. All fields are optional and will not take effect if not configured. Only the profile.build settings of the top-level module take effect.
The lto configuration can be full or thin, corresponding to two compilation modes supported by LTO optimization: full LTO merges all compilation modules for global optimization, offering the highest optimization potential but requiring longer compilation time; thin LTO uses parallel optimization across multiple modules and supports incremental compilation during linking by default, with shorter compilation time than full LTO but less optimization due to reduced global information.
The performance_analysis configuration can be true or false, indicating whether to enable compilation performance analysis. When enabled, cjpm generates .prof and .json files in the performance_analysis directory under the compilation output directory, recording time and memory consumption during compilation. For example, if the default compilation output directory is target and the compilation mode is debug, the directory structure is as follows:
demo
├── cjpm.toml
├── src
| └── demo.cj
└── target
└── debug
└── performance_analysis
├── xxx1.prof
├── ...
├── xxxN.prof
├── xxx1.json
├── ...
└── xxxN.json
The combined configuration is a key-value pair where the key is the module name (package.name) and the value is dynamic. Before configuring this, the module compiles each package into separate dynamic or static library files based on package.output-type. After configuration, the module’s compilation method changes to:
- Subpackages other than
rootare compiled as static libraries. - The
rootpackage is compiled as a dynamic library, linking all subpackage static libraries, regardless of whether the subpackages are dependencies of therootpackage. When other modules depend on this dynamic library as a binary dependency, they can use all symbols from the subpackages.
For example, assume the module demo has the following structure:
demo
├── cjpm.toml
└── src
├── aoo
| └── aoo.cj
├── boo
| └── boo.cj
└── demo.cj
The module configuration file cjpm.toml is configured as follows:
[package]
name = "demo"
[profile.build.combined]
demo = "dynamic"
After compilation, the final output directory target/release/demo will contain the following files (using Linux as an example):
|-- libdemo.so
|-- libdemo.aoo.a
|-- libdemo.boo.a
|-- demo.cjo
|-- demo.aoo.cjo
|-- demo.boo.cjo
Module developers can provide all cjo files and the root package dynamic library libdemo.so to other modules as binary dependencies without providing the subpackage static library files. After other modules depend on this dynamic library, they can depend on all its subpackages in code, such as importing demo.aoo via import demo.aoo.
Note:
- When applying this configuration, compiling the
rootpackage dynamic library requires all subpackage static libraries, so ensure therootpackage is not directly or indirectly imported by its subpackages.- Currently, the
profile.build.combinedconfiguration is experimental and unstable. Developers enabling this configuration should note the following limitations:
- If a module configured with this field directly or indirectly depends on other source modules, those dependent modules must also be configured with this field.
- Source modules depended on by build scripts will not take effect if configured with
profile.build.combined.- The
profile.build.combinedoption is not supported when the compilation target platform ismacOS.
If combined is enabled, cyclic dependencies not identifiable via imports may occur, resulting in cyclic dependency errors. Solutions are as follows:
- If the error message includes
because of combined module 'demo', it means the moduledemois configured as acombinedmodule, and a subpackage ofdemodirectly or indirectly depends on therootpackage. Developers can locate and remove such imports or disable thecombinedconfiguration to resolve the issue. - If the error message includes
between combined modules, it means bothrootpackages in the entry are configured ascombinedmodules, and there are mutual dependencies between them (including subpackages). Developers can locate and remove imports from onecombinedmodule to another or disable bothcombinedconfigurations to resolve the issue.
“profile.test”
[profile.test] # Example usage
parallel=true
filter=*.*
no-color = true
timeout-each = "4m"
random-seed = 10
bench = false
report-path = "reports"
report-format = "xml"
verbose = true
[profile.test.build]
compile-option = ""
lto = "thin"
mock = "on"
[profile.test.env]
MY_ENV = { value = "abc" }
cjHeapSize = { value = "32GB", splice-type = "replace" }
PATH = { value = "/usr/bin", splice-type = "prepend" }
Test configuration supports specifying options during test compilation and execution. All fields are optional and will not take effect if not configured. Only the profile.test settings of the top-level module take effect. The option list aligns with the console execution options provided by cjpm test. If an option is configured in both the configuration file and the console, the console option takes precedence. profile.test supports the following runtime options:
filter: Specifies the test case filter. The value is a string with the same format as the--filtervalue in the test command description.timeout-each <value>: Thevalueformat is%d[millis|s|m|h], specifying the default timeout for each test case.parallel: Specifies the parallel execution scheme for test cases. Thevaluecan be:<BOOL>:trueorfalse. Iftrue, test classes can run in parallel, with the number of parallel processes controlled by the CPU cores available on the system.nCores: The number of parallel test processes equals the available CPU cores.NUMBER: The number of parallel test processes. Must be a positive integer.NUMBERnCores: The number of parallel test processes is a multiple of the available CPU cores. Must be a positive number (supports integers or floats).
option:<value>: Works with@Configureto define runtime options. For example:random-seed: Specifies the random seed value. Must be a positive integer.no-color: Specifies whether to disable colored output in the console. Can betrueorfalse.report-path: Specifies the path for test execution reports (cannot be configured via@Configure).report-format: Specifies the report output format. Currently, unit test reports only supportxmlandxml-per-packageformat (case-insensitive). Other values will throw an exception (cannot be configured via@Configure). Performance test reports only supportcsvandcsv-rawformats.verbose: Specifies whether to display detailed compilation information. Can betrueorfalse.
“profile.test.build”
Specifies supported compilation options, including:
compile-option: A string containing additionalcjccompilation options, supplementing the top-levelcompile-optionfield.lto: Specifies whether to enableLTOoptimization. Can bethinorfull. This feature is only supported on target platforms ofLinux/OpenHarmony.mock: Explicitly sets themockmode. Possible values:on,off,runtime-error. The default value fortest/buildsubcommands ison, and forbenchsubcommands, it isruntime-error.
“profile.test.env”
Configures temporary environment variables when running executables during the test command. The key is the name of the environment variable to configure, with the following options:
value: Specifies the environment variable value.splice-type: Specifies how to splice the environment variable. Optional; defaults toabsent. Possible values:absent: The configuration only takes effect if no environment variable with the same name exists. If one exists, the configuration is ignored.replace: The configuration replaces any existing environment variable with the same name.prepend: The configuration is prepended to any existing environment variable with the same name.append: The configuration is appended to any existing environment variable with the same name.
“profile.bench”
[profile.bench] # Example usage
no-color = true
random-seed = 10
report-path = "bench_report"
baseline-path = ""
report-format = "csv"
verbose = true
Benchmark configuration supports specifying options during benchmark compilation and execution. All fields are optional and will not take effect if not configured. Only the profile.bench settings of the top-level module take effect. The option list aligns with the console execution options provided by cjpm bench. If an option is configured in both the configuration file and the console, the console option takes precedence. profile.bench supports the following runtime options:
filter: Specifies the benchmark case filter. The value is a string with the same format as the--filtervalue in the bench command description.option:<value>: Works with@Configureto define runtime options. For example:random-seed: Specifies the random seed value. Must be a positive integer.no-color: Specifies whether to disable colored output in the console. Can betrueorfalse.report-path: Specifies the path for benchmark execution reports (cannot be configured via@Configure).report-format: Specifies the report output format. Currently, unit test reports only supportxmlandxml-per-packageformat (case-insensitive). Other values will throw an exception (cannot be configured via@Configure). Performance test reports only supportcsvandcsv-rawformats.verbose: Specifies whether to display detailed compilation information. Can betrueorfalse.baseline-path: The path of an existing report to compare with the current performance results. By default, it uses the--report-pathvalue.
“profile.bench.build”
Used to specify additional compilation options when building executables for cjpm bench. Shares the same configuration as profile.test.build.
“profile.bench.env”
Supports configuring environment variables when running executables with the bench command, following the same configuration method as profile.test.env.
“profile.run”
Options for running executables, supporting environment variable configuration env when executing the run command, following the same configuration method as profile.test.env.
“profile.customized-option”
[profile.customized-option]
cfg1 = "--cfg=\"feature1=lion, feature2=cat\""
cfg2 = "--cfg=\"feature1=tiger, feature2=dog\""
cfg3 = "-O2"
Custom options passed through to cjc. Enabled via --cfg1 --cfg3. The customized-option set for each module applies to all packages within that module. For example, when executing cjpm build --cfg1 --cfg3, the command passed to cjc would be --cfg="feature1=lion, feature2=cat" -O2.
Note:
The conditional value here must be a valid identifier.
“target”
Multi-backend, multi-platform isolation options for configuring different settings across various backends and platforms. Taking the Linux system as an example, the target configuration is as follows:
[target.x86_64-unknown-linux-gnu] # Configuration items for Linux systems
compile-option = "value1" # Additional compilation command options
override-compile-option = "value2" # Additional global compilation command options
link-option = "value3" # Linker passthrough options
[target.x86_64-unknown-linux-gnu.dependencies] # Source dependency configuration
[target.x86_64-unknown-linux-gnu.test-dependencies] # Test-phase dependency configuration
[target.x86_64-unknown-linux-gnu.bin-dependencies] # Cangjie binary library dependencies
path-option = ["./test/pro0", "./test/pro1"]
[target.x86_64-unknown-linux-gnu.bin-dependencies.package-option]
"pro0.xoo" = "./test/pro0/pro0.xoo.cjo"
"pro0.yoo" = "./test/pro0/pro0.yoo.cjo"
"pro1.zoo" = "./test/pro1/pro1.zoo.cjo"
[target.x86_64-unknown-linux-gnu.ffi.c] # C language binary library dependencies
"ctest" = "./test/c"
[target.x86_64-unknown-linux-gnu.debug] # Debug configuration for Linux systems
[target.x86_64-unknown-linux-gnu.debug.test-dependencies]
[target.x86_64-unknown-linux-gnu.release] # Release configuration for Linux systems
[target.x86_64-unknown-linux-gnu.release.bin-dependencies]
Developers can add a series of configuration items for a specific target by configuring the target.target-name field. The target name can be obtained in the corresponding Cangjie environment via the command cjc -v, where the Target item in the command output represents the target name for that environment. The above example applies to the Linux system but is also applicable to other platforms, where the target name can similarly be obtained via cjc -v.
Dedicated configuration items for a specific target will apply to the compilation process for that target, as well as cross-compilation processes where other targets specify this target as the target platform. The list of configurable items includes:
compile-option: Additional compilation command optionsoverride-compile-option: Additional global compilation command optionslink-option: Linker passthrough optionsdependencies: Source dependency configuration, structured similarly to thedependenciesfieldtest-dependencies: Test-phase dependency configuration, structured similarly to thetest-dependenciesfieldbin-dependencies: Cangjie binary library dependencies, described belowffi.c: Configuration for external C library dependencies in Cangjie modules, structured similarly to theffi.cfieldcompile-macros-for-target: Macro package control items for cross-compilation, which do not support distinguishing betweendebugandreleasecompilation modes below
Developers can configure target.target-name.debug and target.target-name.release fields to specify additional configurations unique to debug and release compilation modes for that target. The configurable items are the same as above. Configurations under these fields will only apply to the corresponding compilation mode of the target.
“target.target-name[.debug/release].bin-dependencies”
This field is used to import pre-compiled Cangjie library output files suitable for the specified target. The following example demonstrates importing three packages from the pro0 and pro1 modules.
Note:
Unless specifically required, it is not recommended to use this field. Instead, use the
dependenciesfield described earlier to import module source code.
├── test
│ ├── pro0
│ │ ├── libpro0.xoo.so
│ │ ├── pro0.xoo.cjo
│ │ ├── libpro0.yoo.so
│ │ └── pro0.yoo.cjo
│ └── pro1
│ ├── libpro1.zoo.so
│ └── pro1.zoo.cjo
├── src
│ └── main.cj
└── cjpm.toml
Method 1: Import via path-option:
[target.x86_64-unknown-linux-gnu.bin-dependencies]
path-option = ["./test/pro0", "./test/pro1"]
The path-option is a string array structure, where each element represents the path name to be imported. cjpm will automatically import all Cangjie library packages under that path that comply with the naming rules, where the library name format should be full-package-name. For example, the library name corresponding to pro0.xoo.cjo in the above example should be libpro0.xoo.so or libpro0.xoo.a. Packages whose library names do not comply with this rule can only be imported via the package-option.
Method 2: Import via package-option:
[target.x86_64-unknown-linux-gnu.bin-dependencies.package-option]
"pro0.xoo" = "./test/pro0/pro0.xoo.cjo"
"pro0.yoo" = "./test/pro0/pro0.yoo.cjo"
"pro1.zoo" = "./test/pro1/pro1.zoo.cjo"
The package-option is a map structure, where pro0.xoo serves as the key (strings containing . in toml configuration files must be enclosed in ""), so the key value corresponds to libpro0.xoo.so. The path to the frontend file cjo serves as the value, and the corresponding .a or .so file for that cjo must be placed in the same path.
Note:
If the same package is imported via both
package-optionandpath-option, thepackage-optionfield takes higher precedence.
The following code example in main.cj demonstrates calling the pro0.xoo, pro0.yoo, and pro1.zoo packages:
import pro0.xoo.*
import pro0.yoo.*
import pro1.zoo.*
main(): Int64 {
var res = x + y + z // x, y, z are values defined in pro0.xoo, pro0.yoo, and pro1.zoo respectively
println(res)
return 0
}
Note:
The dependent Cangjie dynamic library files may be compilation outputs of the
rootpackage generated by other modules through theprofile.build.combinedconfiguration, containing symbols for all its sub-packages. During dependency checking, if a package’s corresponding Cangjie library is not found, therootpackage corresponding to that package will be used as a dependency, and a warning will be printed. Developers must ensure that therootpackage imported in this way is generated via the corresponding method; otherwise, the library file may not contain symbols for sub-packages, leading to compilation errors. For example, if the source code imports thedemo.aoopackage viaimport demo.aoo, and the binary dependency does not contain the corresponding Cangjie library for that package,cjpmwill attempt to find the dynamic library for therootpackage corresponding to that package, i.e.,libdemo.so. If found, it will use that library as the dependency.
“target.target-name.compile-macros-for-target”
This field configures the cross-compilation method for macro packages, with the following three scenarios:
Method 1: By default, macro packages only compile outputs for the local platform during cross-compilation, not for the target platform. This applies to all macro packages within the module.
[target.target-platform]
compile-macros-for-target = ""
Method 2: During cross-compilation, outputs for both the local and target platforms are compiled. This applies to all macro packages within the module.
[target.target-platform]
compile-macros-for-target = "all" # The configuration item is a string, and the optional value must be "all"
Method 3: Specifies that certain macro packages within the module should compile outputs for both the local and target platforms during cross-compilation, while other unspecified macro packages follow the default mode of Method 1.
[target.target-platform]
compile-macros-for-target = ["pkg1", "pkg2"] # The configuration item is a string array, and the optional values are macro package names
Merging Rules for “target” Related Fields
Configuration items in target may coexist with other options in cjpm.toml. For example, the compile-option field can also exist in the package field, with the difference that the field in package applies to all targets. cjpm merges these duplicate fields in a specific way, combining all applicable configurations. Taking the debug compilation mode for x86_64-unknown-linux-gnu as an example, the target configuration is as follows:
[package]
compile-option = "compile-0"
override-compile-option = "override-compile-0"
link-option = "link-0"
[dependencies]
dep0 = { path = "./dep0" }
[test-dependencies]
devDep0 = { path = "./devDep0" }
[target.x86_64-unknown-linux-gnu]
compile-option = "compile-1"
override-compile-option = "override-compile-1"
link-option = "link-1"
[target.x86_64-unknown-linux-gnu.dependencies]
dep1 = { path = "./dep1" }
[target.x86_64-unknown-linux-gnu.test-dependencies]
devDep1 = { path = "./devDep1" }
[target.x86_64-unknown-linux-gnu.bin-dependencies]
path-option = ["./test/pro1"]
[target.x86_64-unknown-linux-gnu.bin-dependencies.package-option]
"pro1.xoo" = "./test/pro1/pro1.xoo.cjo"
[target.x86_64-unknown-linux-gnu.debug]
compile-option = "compile-2"
override-compile-option = "override-compile-2"
link-option = "link-2"
[target.x86_64-unknown-linux-gnu.debug.dependencies]
dep2 = { path = "./dep2" }
[target.x86_64-unknown-linux-gnu.debug.test-dependencies]
devDep2 = { path = "./devDep2" }
[target.x86_64-unknown-linux-gnu.debug.bin-dependencies]
path-option = ["./test/pro2"]
[target.x86_64-unknown-linux-gnu.debug.bin-dependencies.package-option]
"pro2.xoo" = "./test/pro2/pro2.xoo.cjo"
When target configuration items coexist with public configuration items in cjpm.toml or other levels of configuration items for the same target, they are merged according to the following priority:
- Configuration for the corresponding
targetindebug/releasemode - Configuration for the corresponding
targetunrelated todebug/release - Public configuration items
In the above target configuration example, the target configuration items are merged according to the following rules:
compile-option: All applicable configurations with the same name are concatenated in order of priority, with higher-priority configurations appended later. In this example, the finalcompile-optionvalue indebugmode forx86_64-unknown-linux-gnuiscompile-0 compile-1 compile-2, while inreleasemode it iscompile-0 compile-1, and for othertargetsit iscompile-0.override-compile-option: Same as above. Sinceoverride-compile-optionhas higher priority thancompile-option, in the final compilation command, the concatenatedoverride-compile-optionwill be placed after the concatenatedcompile-option.link-option: Same as above.dependencies: Source dependencies are merged directly, and conflicts will result in errors. In this example, the finaldependenciesindebugmode forx86_64-unknown-linux-gnuaredep0,dep1, anddep2, while inreleasemode onlydep0anddep1are active. For othertargets, onlydep0is active.test-dependencies: Same as above.bin-dependencies: Binary dependencies are merged by priority, with conflicts resolved by keeping only the higher-priority dependency.package-optionconfigurations are added first for configurations with the same priority. In this example, indebugmode forx86_64-unknown-linux-gnu, binary dependencies from./test/pro1and./test/pro2are added, while inreleasemode only./test/pro1is added. Sincebin-dependencieshas no public configuration, no binary dependencies are active for othertargets.
In cross-compilation scenarios for this example, if x86_64-unknown-linux-gnu is specified as the target target on other platforms, the configuration for target.x86_64-unknown-linux-gnu will also be merged with public configuration items according to the above rules. If in debug mode, the configuration items for target.x86_64-unknown-linux-gnu.debug will also be applied.
Environment Variable Configuration
Environment variables can be used in cjpm.toml to configure field values. cjpm will retrieve the corresponding environment variable values from the current runtime environment and substitute them into the actual configuration values. For example, the following dependencies field uses an environment variable for path configuration:
[dependencies]
aoo = { path = "${DEPENDENCY_PATH}/aoo" }
When importing module aoo, cjpm will retrieve the DEPENDENCY_PATH variable value and substitute it to obtain the final path for module aoo.
The list of fields that support environment variable configuration includes:
- The following fields in the single-module configuration field
package:- Single-package compilation option
compile-optioninpackage-configuration
- Single-package compilation option
- The following fields in the workspace management field
workspace:- Member module list
members - Compilation module list
build-members - Test module list
test-members
- Member module list
- The following fields common to both
packageandworkspace:- Compilation option
compile-option - Global compilation option
override-compile-option - Linking option
link-option - Output directory path
target-dir
- Compilation option
- The
pathfield for local dependencies in the build dependency listdependencies - The
pathfield for local dependencies in the test dependency listtest-dependencies - The
pathfield for local dependencies in the build script dependency listscript-dependencies - Custom passthrough options
customized-optionin the command profile configurationprofile - The
pathfield in external C library configurationffi.c - The following fields in the platform isolation option
target:- Compilation option
compile-option - Global compilation option
override-compile-option - Linking option
link-option - The
pathfield for local dependencies in the build dependency listdependencies - The
pathfield for local dependencies in the test dependency listtest-dependencies - The
pathfield for local dependencies in the build script dependency listscript-dependencies - The
path-optionandpackage-optionfields in the binary dependency fieldbin-dependencies
- Compilation option
Project Management Configuration File Specification
The project management configuration file, cangjie-repo.toml, is utilized to configure settings including the central repository URL and local repository cache. The cjpm tool primarily leverages this file to interface with the central repository and manage dependency modules downloaded from the central repository.
The cangjie-repo.toml file can be configured in three locations. When executing the cjpm command, it reads the configuration files in the following priority order from highest to lowest:
- A
cangjie-repo.tomlfile located alongsidecjpm.toml: In the currentcjpmmodule directory where the command is executed. - A
cangjie-repo.tomlfile under the.cjpmdirectory of user’s home directory.- For
Linux/macOS:$HOME/.cjpm - For
Windows:%USERPROFILE%/.cjpm
- For
- A
cangjie-repo.tomlfile in the Cangjie SDK directory at the pathtools/config/cangjie-repo.toml.
Upon successfully locating a valid cangjie-repo.toml file, cjpm will utilize this file as the configuration source for the current command execution and will disregard all configuration files of lower precedence.
The configuration file format is as follows:
[repository.cache]
path = "/path/to/repository/cache"
[repository.home]
registry = "central/repo/url"
token = "user-token"
[global]
strict-tls = true
The configuration content is described as follows:
repository.homeis used to configure the central repository URL and the user’s personal token. Thecjpmtool interacts with the central repository address specified in theregistryfield, and all interaction requests will include the user’s token information for authentication.repository.cacheis used to configure the local path for storing source code modules downloaded from the central repository or Git. Environment variables can be used to configure field values, Refer toEnvironment Variable Configuration, If not configured, it defaults to the.cjpmdirectory in the user’s home directory. Once the local path is determined, Git source code modules are downloaded to thegitsubdirectory under this path. Central repository source code modules are downloaded to therepository/sourcesubdirectory under this path.global.strict-tlsis used to configure the TLS certificate verification method. The default is normal verification; when set tofalse, certificate verification is disabled.
Configuration and Cache Directories
The storage path for files downloaded by cjpm via git can be specified using the CJPM_CONFIG environment variable. If not specified, the default location on Linux/macOS is $HOME/.cjpm, and on Windows it is %USERPROFILE%/.cjpm.
Note:
- This configuration functions identically to
repository.cacheincangjie-repo.toml. It only takes effect if no validcangjie-repo.tomlconfiguration exists, or if the valid configuration is the one located attools/config/cangjie-repo.tomlwithin the Cangjie SDK.- This configuration is deprecated and will be removed in a future release. Please use
cangjie-repo.tomlinstead.
Cangjie Package Management Specification
In the Cangjie package management specification, for a file directory to be recognized as a valid source package, the following requirements must be met:
- It must directly contain at least one Cangjie code file;
- Its parent package (including the parent’s parent package, up to the
rootpackage) must also be a valid source package. Note that the modulerootpackage has no parent package, so it only needs to satisfy condition 1.
For example, consider the following cjpm project named demo:
demo
├──src
│ ├── main.cj
│ └── pkg0
│ ├── aoo
│ │ └── aoo.cj
│ └── boo
│ └── boo.cj
└── cjpm.toml
Here, the directory corresponding to demo.pkg0 does not directly contain any Cangjie code, so demo.pkg0 is not a valid source package. Although demo.pkg0.aoo and demo.pkg0.boo directly contain Cangjie code files aoo.cj and boo.cj, their upstream package demo.pkg0 is not a valid source package, so these two packages are also not valid source packages.
When cjpm identifies a package like demo.pkg0 that does not directly contain Cangjie files, it treats it as a non-source package, ignores all its subpackages, and prints the following warning:
Warning: there is no '.cj' file in directory 'demo/src/pkg0', and its subdirectories will not be scanned as source code
Therefore, if developers need to configure a valid source package, the package must directly contain at least one Cangjie code file, and all its upstream packages must be valid source packages. Taking the above demo project as an example, to make demo.pkg0, demo.pkg0.aoo, and demo.pkg0.boo all recognized as valid source packages, a Cangjie code file can be added inside demo/src/pkg0, as shown below:
demo
├── src
│ ├── main.cj
│ └── pkg0
│ ├── pkg0.cj
│ ├── aoo
│ │ └── aoo.cj
│ └── boo
│ └── boo.cj
└── cjpm.toml
demo/src/pkg0/pkg0.cj must be a Cangjie code file that complies with the package management specification and may not contain functional code, such as the following form:
package demo.pkg0
Command Extension
cjpm provides a command extension mechanism, allowing developers to extend cjpm commands via executable files named in the format cjpm-xxx(.exe).
For an executable file cjpm-xxx (cjpm-xxx.exe on Windows), if the file’s directory is configured in the system environment variable PATH, the following command can be used to execute it:
cjpm xxx [args]
Here, args represents the list of arguments that may be required by cjpm-xxx(.exe). The above command is equivalent to:
cjpm-xxx(.exe) [args]
Running cjpm-xxx(.exe) may depend on certain dynamic libraries. In such cases, developers need to manually add the directory containing the required dynamic libraries to the environment variables.
Below is an example using cjpm-demo, an executable file compiled from the following Cangjie code:
import std.process.*
import std.collection.*
main(): Int64 {
var args = ArrayList<String>(Process.current.arguments)
if (args.size < 1) {
eprintln("Error: failed to get parameters")
return 1
}
println("Output: ${args[0]}")
return 0
}
After adding its directory to PATH, running the corresponding command will execute the file and produce the expected output.
Input: cjpm demo hello,world
Output: Output: hello,world
Built-in cjpm commands have higher priority, so these commands cannot be extended this way. For example, even if an executable file named cjpm-build exists in the system environment variables, cjpm build will not execute this file but will instead run cjpm with build as an argument.
Build Scripts
cjpm provides a build script mechanism, allowing developers to define behaviors for cjpm before or after executing certain commands.
The build script source file is fixed as build.cj and is located in the Cangjie project’s root directory, at the same level as cjpm.toml. When creating a new Cangjie project using the init command, cjpm does not create build.cj by default. Developers who need it can manually create and edit build.cj in the specified location using the following template format:
// build.cj
import std.process.*
// Case of pre/post codes for 'cjpm build'.
/* called before `cjpm build`
* Success: return 0
* Error: return any number except 0
*/
// func stagePreBuild(): Int64 {
// // process before "cjpm build"
// 0
// }
/*
* called after `cjpm build`
*/
// func stagePostBuild(): Int64 {
// // process after "cjpm build"
// 0
// }
// Case of pre codes for 'cjpm clean'.
/* called before `cjpm clean`
* Success: return 0
* Error: return any number except 0
*/
// func stagePreClean(): Int64 {
// // process before "cjpm clean"
// 0
// }
// For other options, define stagePreXXX and stagePostXXX in the same way.
/*
* Error code:
* 0: success.
* other: cjpm will finish running command. Check target-dir/build-script-cache/module-name/script-log for error outputs defind by user in functions.
*/
main(): Int64 {
match (Process.current.arguments[0]) {
// Add operation here with format: "pre-"/"post-" + optionName
// case "pre-build" => stagePreBuild()
// case "post-build" => stagePostBuild()
// case "pre-clean" => stagePreClean()
case _ => 0
}
}
cjpm supports using build scripts to define pre- and post-command behaviors for a series of commands. For example, for the build command, you can define pre-build in the match block within the main function to execute the desired pre-build functionality function stagePreBuild (the function name is not restricted). Post-build behavior can be similarly defined by adding a post-build case. Other commands can be extended in the same way by adding corresponding pre/post options and functionality functions.
After defining pre- and post-command behaviors, cjpm will first compile build.cj when executing the command and then execute the corresponding behaviors before and after the command. For example, with pre-build and post-build defined, running cjpm build will follow these steps:
- Before the build process, compile
build.cj; - Execute the functionality function corresponding to
pre-build; - Proceed with the
cjpm buildcompilation process; - After successful compilation,
cjpmwill execute the functionality function corresponding topost-build.
The commands supported by build scripts are as follows:
build,test,bench: Support executing bothpreandpostprocesses defined in dependent modules’ build scripts.run,install: Only support running thepreandpostbuild script processes of the corresponding module or executing thepre-buildandpost-buildprocesses of dependent modules during compilation.check,tree,update: Only support running thepreandpostbuild script processes of the corresponding module.clean: Only support running theprebuild script process of the corresponding module.
When executing these commands, if the --skip-script option is configured, all build script compilation and execution will be skipped, including those of dependent modules.
Usage notes for build scripts:
- The return value of functionality functions must meet certain requirements: a successful execution should return
0, while a failure should return anyInt64value except0. - All outputs from
build.cjwill be redirected to the project directory atbuild-script-cache/[target|release]/[module-name]/bin/script-log. Developers can check this file for output content added in functionality functions. - If
build.cjdoes not exist in the project root directory,cjpmwill proceed with normal execution. Ifbuild.cjexists and defines pre- or post-command behaviors, the command will abort abnormally ifbuild.cjfails to compile or the functionality function returns a non-zero value, even if the command itself could execute successfully. - In multi-module scenarios, the build scripts (
build.cj) of dependent modules take effect during compilation and unit testing. Outputs from dependent module build scripts are also redirected to log files in the corresponding module directory underbuild-script-cache/[target|release].
For example, the following build script build.cj defines pre- and post-build behaviors:
import std.process.*
func stagePreBuild(): Int64 {
println("PRE-BUILD")
0
}
func stagePostBuild(): Int64 {
println("POST-BUILD")
0
}
main(): Int64 {
match (Process.current.arguments[0]) {
case "pre-build" => stagePreBuild()
case "post-build" => stagePostBuild()
case _ => 0
}
}
When executing cjpm build, cjpm will execute stagePreBuild and stagePostBuild. After cjpm build completes, the script-log file will contain the following output:
PRE-BUILD
POST-BUILD
Build scripts can import dependent modules via the script-dependencies field in cjpm.toml, with the same format as dependencies. For example, the following configuration in cjpm.toml imports the aoo module, which contains a method named aaa():
[script-dependencies]
aoo = { path = "./aoo" }
The build script can then import this dependency and use the interface aaa():
import std.process.*
import aoo.*
func stagePreBuild(): Int64 {
aaa()
0
}
func stagePostBuild(): Int64 {
println("POST-BUILD")
0
}
main(): Int64 {
match (Process.current.arguments[0]) {
case "pre-build" => stagePreBuild()
case "post-build" => stagePostBuild()
case _ => 0
}
}
Build script dependencies (script-dependencies) are independent of source code-related dependencies (dependencies and test-dependencies). Source and test code cannot use modules from script-dependencies, and build scripts cannot use modules from dependencies or test-dependencies. If the same module is needed in both build scripts and source/test code, it must be configured in both script-dependencies and dependencies/test-dependencies.
Usage Examples
The following example demonstrates how to use cjpm with a Cangjie project directory structure. The corresponding source code examples can be found in Source Code. The module name for this Cangjie project is test.
cj_project
├── pro0
│ ├── cjpm.toml
│ └── src
│ ├── zoo
│ │ ├── zoo.cj
│ │ └── zoo_test.cj
│ └── pro0.cj
├── src
│ ├── koo
│ │ ├── koo.cj
│ │ └── koo_test.cj
│ ├── main.cj
│ └── main_test.cj
└── cjpm.toml
Using init and build
-
Create a new Cangjie project and write source code
xxx.cjfiles, such as thekoopackage andmain.cjfile shown in the example structure.cjpm init --name test --path ./cj_project cd cj_project mkdir src/kooAt this point, a
cj_projectdirectory will be created in the current command execution directory, and thesrcfolder along with the defaultcjpm.tomlconfiguration file will be automatically generated within it. Developers can manually create sub-packages (e.g.,src/koo) in the source code directorysrc, or add new source files and test files in each package as needed. -
When the current module depends on an external
pro0module, create thepro0module and its configuration file. Then write the module’s source code files, manually creating thesrcdirectory underpro0, and place the Cangjie packages undersrc, such as thezoopackage in the example structure.mkdir pro0 && cd pro0 cjpm init --name pro0 --type=static mkdir src/zoo -
When the main module depends on
pro0, configure thedependenciesfield in the main module’s configuration file as described in the manual. After correct configuration, executecjpm build. The generated executable will be in thetarget/release/bin/directory.cd cj_project vim cjpm.toml cjpm build cjpm run
Using test and clean
-
After writing the corresponding
xxx_test.cjunit test files for each file as shown in the example structure, execute the following code to run unit tests. The generated files will be in thetarget/release/unittest_bindirectory.cjpm testOr:
cjpm test src src/koo pro0/src/zoo -
To manually delete intermediate files such as the
targetandcov_outputdirectories,*.gcno, and*.gcda, execute:cjpm clean
Example Source Code
// cj_project/src/main.cj
package test
import pro0.zoo.*
import test.koo.*
main(): Int64 {
let res = z + k
println(res)
let res2 = concatM("a", "b")
println(res2)
return 0
}
func concatM(s1: String, s2: String): String {
return s1 + s2
}
// cj_project/src/main_test.cj
package test
import std.unittest.* // testfame
import std.unittest.testmacro.* // macro_Defintion
@Test
public class TestM{
@TestCase
func sayhi(): Unit {
@Assert(concatM("1", "2"), "12")
@Assert(concatM("1", "3"), "13")
}
}
// cj_project/src/koo/koo.cj
package test.koo
public let k: Int32 = 12
func concatk(s1: String, s2: String): String {
return s1 + s2
}
// cj_project/src/koo/koo_test.cj
package test.koo
import std.unittest.* // testfame
import std.unittest.testmacro.* // macro_Defintion
@Test
public class TestK{
@TestCase
func sayhi(): Unit {
@Assert(concatk("1", "2"), "12")
@Assert(concatk("1", "3"), "13")
}
}
// cj_project/pro0/src/pro0.cj
package pro0
// cj_project/pro0/src/zoo/zoo.cj
package pro0.zoo
public let z: Int32 = 26
func concatZ(s1: String, s2: String): String {
return s1 + s2
}
// cj_project/pro0/src/zoo/zoo_test.cj
package pro0.zoo
import std.unittest.* // test framework
import std.unittest.testmacro.* // macro definition
@Test
public class TestZ{
@TestCase
func sayhi(): Unit {
@Assert(concatZ("1", "2"), "12")
@Assert(concatZ("1", "3"), "13")
}
}
# cj_project/cjpm.toml
[package]
cjc-version = "1.0.0"
description = "nothing here"
version = "1.0.0"
name = "test"
output-type = "executable"
[dependencies]
pro0 = { path = "pro0" }
# cj_project/pro0/cjpm.toml
[package]
cjc-version = "1.0.0"
description = "nothing here"
version = "1.0.0"
name = "pro0"
output-type = "static"
Appendix
Cross-Compilation Instructions
cjpm supports cross-compilation and execution between certain platforms. For example, assuming the target platform is arch-sys-abi, the compilation steps are as follows:
-
Configure the toolchain required for the target platform.
-
In the
cjpm.tomlof the entry module, add the compilation option configuration needed for the target platform:[target.arch-sys-abi] override-compile-option = "value"This configuration will apply to all dependent modules. In single-module compilation mode, it can be replaced with
compile-option. -
If the project has binary dependencies, configure them as follows:
[target.arch-sys-abi.bin-dependencies] path-option = [...] [target.arch-sys-abi.bin-dependencies.package-option] "..." = "..." -
Use the following commands to compile and build or test the code:
cjpm build --target=arch-sys-abi# Cross-compile artifacts for the target platform cjpm test --target=arch-sys-abi # Cross-compile executable test files for the target platform -
Import the binary artifacts into the target platform for normal execution.
Note:
- The compiled artifacts are located in the
target-dirdirectory configured by the user, under a subdirectory named after the target platform (target).- If dynamic library dependencies exist, configure them in the runtime environment variable
LD_LIBRARY_PATH.
Multi-Platform Build Instructions
To support multi-platform project builds, cjpm introduces new entities such as features and source-sets to enhance development efficiency for multi-platform projects. This is an experimental feature and requires specifying experimental = true in the [profile] field.
Feature
A Feature is a named flag used to specify the source code to be compiled. Below is a list of all available features in cjpm:
feature.os.posix
feature.os.epoll
feature.os.kqueue
feature.os.windows
feature.os.linux
# OS is harmony
feature.os.hm
feature.os.darwin
# The CPU architecture is `big-endian`
feature.arch.big
# The CPU architecture is `little-endian`
feature.arch.little
feature.arch.x64
feature.arch.aarch64
# The CPU uses `sse` instruction set
feature.arch.sse1
feature.arch.sse2
feature.arch.sse3
feature.arch.sse4.1
feature.arch.sse4.2
# The CPU uses `avx` instruction set
feature.arch.avx1
feature.arch.avx2
feature.arch.avx512
feature.arch.neon
feature.env.ohos
feature.env.gnu
feature.env.mingw32
feature.env.hos
feature.env.android
feature.cj.cjnative
feature.cj.v0_54_3 # Anything after `v` is interpreted as `cjc` version
These values are not validated in any way. It is the developer’s responsibility to ensure that the code compiled with these features executes correctly.
Developers can define custom feature values and meanings.
Developers can use the --enable-features option in cjpm’s build, run, and test commands, providing a comma-separated list of feature values.
For example:
cjpm build --enable-features=feature.os.linux,feature.env.gnu
Feature Deduction
The following feature values can be inferred from the cjc or other compilation options (e.g., --target) used in cjpm, so they typically do not need to be explicitly specified:
feature.os.posix
feature.os.windows
feature.os.linux
feature.os.hm
feature.os.darwin
feature.arch.x64
feature.arch.aarch64
feature.env.ohos
feature.env.gnu
feature.env.mingw32
feature.env.hos
feature.env.android
feature.cj.cjvm
feature.cj.cjnative
feature.cj.v0_54_3
If developing a multi-platform project on a GNU/Linux machine, the source code can be compiled and run using the following commands:
# Short command
cjpm run
# Full command
cjpm run --enable-features=feature.os.linux,feature.env.gnu
For cross-compilation scenarios, specify --target.
For example:
# Short command
cjpm build --target=aarch64-linux-android
# Full command
cjpm build --target=aarch64-linux-android --enable-features=feature.arch.aarch64,feature.os.linux,feature.env.android
If cjpm deduces multiple source directories (resulting in a compilation error), use the --no-feature-deduce option to disable deduction and explicitly specify features with --enable-features:
# Will display the error message "No source set was selected"
cjpm build --target=aarch64-linux-android --no-feature-deduce
# Specify appropriate feature values:
# 1. [..., "feature.os.linux", "feature.env.android"]
# 2. [..., "feature.os.linux"]
# 3. [..., "feature.env.android"]
cjpm build --no-feature-deduce --target=aarch64-linux-android --enable-features=feature.os.linux,feature.env.android
Source Sets and Their Configuration
The default cjpm.toml is as follows:
[package]
cjc-version = "0.57.1"
compile-option = ""
description = "nothing here"
link-option = ""
name = "cmp_lib"
output-type = "dynamic"
override-compile-option = ""
target-dir = ""
version = "1.0.0"
package-configuration = {}
# New field indicating a multi-platform project
# [source-set]
For the source-set field configuration, an example is provided below:
# Example syntax:
# Source file directory for Cangjie code
[source-set.epoll]
src-dir = "src/net/select/epoll"
condition = [ "feature.os.epoll" ]
[source-set.kqueue]
src-dir = "src/net/select/kqueue"
condition = [ "feature.os.kqueue" ]
Each source set declaration consists of three parts: the source-set identifier, src-dir, and condition.
Source-Set Identifier
The source-set identifier is the unique “path” of the source set. This “path” is separated by ., and each path should start with source-set as it is the root of all declarations.
src-dir
Specifies the location to search for the package’s source code when this source set is enabled.
# Possible syntax
# 1. Single directory
[source-set.${source set fully qualified name}]
src-dir = "./src/linux/common"
# 2. Multiple directories
# All specified directories will be compiled as a single compilation unit, similar to compiling a single directory.
[source-set.${source set fully qualified name}]
src-dir = ["./src/linux/dirA", "./src/linux/dirB"]
condition
The condition specifies one or more features that must be enabled to compile the corresponding code.
# If the source set declaration does not include a `condition` field, no constraints are set.
# Possible syntax:
# 1. Multiple features must be satisfied simultaneously.
[source-set.${source set fully qualified name}]
condition = ["feature.arch.aarch64", "feature.env.ohos", "feature.os.linux"]
# 2. Any one of multiple features must be satisfied.
[source-set.${source set fully qualified name}]
condition.1 = ["feature.arch.little"]
condiiton.alpha = ["feature.env.ohos"]
condiiton.beta= ["feature.os.linux", "feature.os.posix"]# Both features must be satisfied.
Special Source Sets and Their Configuration
Common Source Set
Only one source set can be selected at the same level to ensure code isolation across platforms. However, some code should always be included. This is called the common source set, whose code is always included in the compilation process, and the condition configuration is not applicable. Example:
[source-set.common]
src-dir = "./common"
[source-set.socketSelection.common]
src-dir = "./socket/common"
[source-set.socketSelection.kqueue]
src-dir = "./socket/kqueue"
condition = [ "feature.os.kqueue" ]
Nested Source Sets
When source sets have nested levels, the directory hierarchy of the code must correspond to the source set hierarchy (i.e., the parent source set path must be a prefix of the child source set path). Additionally, nested source sets do not support multiple paths.
[source-set.common]
src-dir = "./common"
[source-set.socketSelection]
src-dir = "./socketSelection/weird"
[source-set.socketSelection.a]
src-dir = "./socketSelection/weird/a"
[source-set.socketSelection.b]
src-dir = "./socketSelection/weird/b"
Other Source Set
If the configuration file contains only single-level source sets and cjpm cannot match any source set, a compilation error will occur. However, for nested source sets, cjpm implicitly generates a special source set named other, whose path is derived from the parent path. To customize the path, explicitly configure this source set, but its condition cannot be configured. Example:
[source-set.common]
src-dir = "./common"
[source-set.socketSelection]
src-dir = "./socketSelection"
[source-set.socketSelection.weird]
src-dir = "./socketSelection/weird"
condition = [ "feature.os.linux", "feature.os.windows" ] # Practically impossible condition to meet
# This source set will be selected as fallback
[source-set.socketSelection.other]
src-dir = "./socketSelection/other"
If no top-level source set is selected, the following error message will appear:
"No source set specified in ${path/to/cjpm.toml} was selected"
Default Source Set
The default source set represents the types of source sets supported by default in cjpm. If no modifications are needed, they do not need to be specified in cjpm.toml. To modify them, specify and configure the corresponding source set in the configuration file, which will override the default configuration.
For example, setting an empty source-set configuration indicates that the project is a multi-platform project.
[source-set]
In this case, cjpm will implicitly generate the default configuration as follows:
[source-set.common]
src-dir = "./common"
[source-set.windows]
src-dir = "./windows"
condition = ["feature.os.windows"]
[source-set.linux]
src-dir = "./linux"
condition = ["feature.os.linux", "feature.env.gnu"]
[source-set.darwin]
src-dir = "./darwin"
condition = ["feature.os.darwin"]
[source-set.android]
src-dir = "./android"
condition = ["feature.os.linux", "feature.env.android"]
[source-set.hos]
src-dir = "./hos"
condition = ["feature.os.linux", "feature.env.hos"]
[source-set.ohos]
src-dir = "./ohos"
condition = ["feature.os.linux", "feature.env.ohos"]
Debugging Tool
cjdb(Cangjie Debug) is a command-line debugging tool for Cangjie programs developed based on lldb. The current cjdb tool is adapted and evolved from the llvm15.0.4 foundation, providing Cangjie developers with program debugging capabilities.
Obtaining the cjdb Tool
Acquisition Method
Obtain it through the Cangjie SDK.
The path of the cjdb tool in the SDK: cangjie\tools\bin.
Usage Example
The following demonstrates the usage on the Windows platform:
After decompression, simply run cjdb.exe in the tool’s directory cangjie\tools\bin.
Note:
Explanation of
systemparameter values:
system parameter value Description windows Tool for Windows platform linux Tool for Linux platform darwin Tool for macOS platform Important
Ensure that the compiler used to build the ELF file or application being debugged matches the version of the toolchain from which the
cjdbdebugger was obtained.
cjdb Commands
Note:
To view more commands, execute
helpin the command-line window:(cjdb) help Debugger commands: apropos -- List debugger commands related to a word or subject. breakpoint -- Commands for operating on breakpoints (see 'help b' for shorthand.) cjthread -- Commands for operating on one or more cjthread in the current process. command -- Commands for managing custom LLDB commands. disassemble -- Disassemble specified instructions in the current target. Defaults to the current function for the current thread and stack frame. expression -- Evaluate an expression on the current thread. Displays any returned value with LLDB's default formatting. frame -- Commands for selecting and examing the current thread's stack frames. ...
Logging
To facilitate issue localization, use the log <subcommand> [<command-options>] command to record cjdb logs.
-
help logto viewlogcommand help(cjdb) help log Commands controlling LLDB internal logging. Syntax: log <subcommand> [<command-options>] The following subcommands are supported: disable -- Disable one or more log channel categories. enable -- Enable logging for a single log channel. list -- List the log categories for one or more log channels. If none specified, lists them all. timers -- Enable, disable, dump, and reset LLDB internal performance timers. For more help on any particular subcommand, type 'help <command> <subcommand>'. -
log listto view supported log list(cjdb) log listOther commands can be explored using the
helpcommand.
Platform
Commands in cjdb for managing and creating platforms include platform [connect|disconnect|info|list|status|select] ...
-
View
platformhelp information onwindowsplatform.(cjdb) help platform Commands to manage and create platforms. Syntax: platform [connect|disconnect|info|list|status|select] ... The following subcommands are supported: connect -- Select the current platform by providing a connection URL. disconnect -- Disconnect from the current platform. file -- Commands to access files on the current platform. get-file -- Transfer a file from the remote end to the local host. get-size -- Get the file size from the remote end. list -- List all platforms that are available. mkdir -- Make a new directory on the remote end. process -- Commands to query, launch and attach to processes on the current platform. put-file -- Transfer a file from this system to the remote end. select -- Create a platform if needed and select it as the current platform. settings -- Set settings for the current target's platform, or for a platform by name. shell -- Run a shell command on the current platform. Expects 'raw' input (see 'help raw-input'.) status -- Display status for the current platform. target-install -- Install a target (bundle or executable file) to the remote end. For more help on any particular subcommand, type 'help <command> <subcommand>'. (cjdb)
Functions
Stepping Into Functions with Debug Information
Use thread step-over <cmd-options> [<thread-id>] (thread step-over can be abbreviated as next or n) to skip stepping into functions and directly execute the next line of code.
(cjdb) n
Process 2884 stopped
* thread #1, name = 'test', stop reason = step over
frame #0: 0x0000000000401498 test`default.main() at test.cj:5:7
2 main(): Int64 {
3
4 var a : Int32 = 12
-> 5 a = a + 23
6 a = test(10, 34)
7 return 1
8 }
(cjdb)
When debugging with cjdb, use thread step-in <cmd-options> [<thread-id>] (thread step-in can be abbreviated as step or s) to step into functions (the function must have debug information).
(cjdb) n
Process 5240 stopped
* thread #1, name = 'test', stop reason = step over
frame #0: 0x00000000004014d8 test`default.main() at test.cj:6:7
3
4 var a : Int32 = 12
5 a = a + 23
-> 6 a = test(10, 34)
7 return 1
8 }
9
(cjdb) s
Process 5240 stopped
* thread #1, name = 'test', stop reason = step in
frame #0: 0x0000000000401547 test`default.test(a=10, b=34) at test.cj:12:10
9
10 func test(a : Int32, b : Int32) : Int32 {
11
-> 12 return a + b
13 }
14
(cjdb)
Exiting the Current Function
Execute the finish command to exit the current function and return to the previous call stack function.
(cjdb) s
Process 5240 stopped
* thread #1, name = 'test', stop reason = step in
frame #0: 0x0000000000401547 test`default.test(a=10, b=34) at test.cj:12:10
9
10 func test(a : Int32, b : Int32) : Int32 {
11
-> 12 return a + b
13 }
14
(cjdb) finish
Process 5240 stopped
* thread #1, name = 'test', stop reason = step out
Return value: (int) $0 = 44
frame #0: 0x00000000004014dd test`default.main() at test.cj:6:7
3
4 var a : Int32 = 12
5 a = a + 23
-> 6 a = test(10, 34)
7 return 1
8 }
9
(cjdb)
Breakpoints
Setting Source Code Breakpoints
breakpoint set --file test.cj --line line_number
--line specifies the line number.
--file specifies the file.
For single files, only the line number is required. For multiple files, the filename must be included.
b test.cj:4 is shorthand for breakpoint set --file test.cj --line 4.
Example: breakpoint set –line 2
(cjdb) b 2
Breakpoint 1: where = test`default.main() + 13 at test.cj:4:3, address = 0x0000000000401491
(cjdb) b test.cj : 4
Breakpoint 2: where = test`default.main() + 13 at test.cj:4:3, address = 0x0000000000401491
(cjdb)
Setting Function Breakpoints
breakpoint set --name function_name
--name specifies the function name to set the breakpoint.
b test is shorthand for breakpoint set --name test.
Example: breakpoint set –name test
(cjdb) b test
Breakpoint 3: where = test`default.test(int, int) + 19 at test.cj:12:10, address = 0x0000000000401547
(cjdb)
Setting Conditional Breakpoints
breakpoint set --file xx.cj --line line_number --condition expression
--file specifies the file.
--line specifies the line number.
--condition specifies the condition.
Example: breakpoint set –file test.cj –line 4 –condition a==12
(cjdb) breakpoint set --file test.cj --line 4 --condition a==12
Breakpoint 2: where = main`default::main() + 60 at test.cj:4:9, address = 0x00005555555b62d0
(cjdb) c
Process 3128551 resuming
Process 3128551 stopped
* thread #1, name = 'schmon', stop reason = breakpoint 2.1
frame #0: 0x00005555555b62d0 main`default::main() at test.cj:4:9
1 main(): Int64 {
2
3 var a : Int32 = 12
-> 4 a = a + 23
5 return 1
6 }
Continuing to the Next Breakpoint
(cjdb) c
Process 2884 resuming
Process 2884 stopped
* thread #1, name = 'test', stop reason = breakpoint 3.1
frame #0: 0x0000000000401547 test`default.test(a=10, b=34) at test.cj:12:10
9
10 func test(a : Int32, b : Int32) : Int32 {
11
-> 12 return a + b
13 }
14
(cjdb)
Watchpoints
watchpoint set variable -w read variable_name
-w specifies the watchpoint type, which can be read, write, or read_write.
wa s v is shorthand for watchpoint set variable.
Example: watchpoint set variable -w read a
(cjdb) wa s v -w read a
Watchpoint created: Watchpoint 1: addr = 0x7fffddffed70 size = 8 state = enabled type = r
declare @ 'test.cj:27'
watchpoint spec = 'a'
new value: 10
(cjdb)
Watchpoints can only be set on basic types. On Windows, when setting conditions for watchpoints, the program will pause at most once.
Expression Evaluation
In cjdb, expression evaluation can be performed using expression <cmd-options> -- <expr> (expression can be abbreviated as expr).
- Viewing literals
Example: expr 3
(cjdb) expr 3
(Int64) $0 = 3
(cjdb)
- Viewing variable names
Example: expr a
(cjdb) expr a
(Int64) $0 = 3
(cjdb)
- Viewing arithmetic expressions
Example: expr a + b
(cjdb) expr a + b
(Int64) $0 = 3
(cjdb)
- Viewing relational expressions
Example: expr a > b
(cjdb) expr a > b
(Bool) $0 = false
(cjdb)
- Viewing logical expressions
Example: expr a && b
(cjdb) expr true && false
(Bool) $0 = false
(cjdb)
- Viewing postfix expressions
Example: expr a.b
(cjdb) expr value.member
(Int64) $0 = 1
(cjdb)
Example: expr a[b]
(cjdb) expr array[2]
(Int64) $0 = 3
(cjdb)
- Viewing generic instantiation variables
Example: expr a
(cjdb) expr a
(default.A<Int32>) $0 = {
member = 1
}
(cjdb)
Supported expression evaluations include but are not limited to: literals, variable names, parenthesized expressions, arithmetic expressions, relational expressions, conditional expressions, loop expressions, member access expressions, index access expressions, range expressions, bitwise operation expressions, generic instantiation variables, etc.
Note:
Unsupported expression evaluations include: function calls with named parameters, interop, extensions, attributes, aliases, interpolated strings, function names. The
Windowsplatform does not support the Float16 type, and exception throwing is not supported.
Variable Inspection
- Viewing local variables,
locals
(cjdb) locals
(Int32) a = 12
(Int64) b = 68
(Int32) c = 13
(Array<Int64>) array = {
[0] = 2
[1] = 4
[2] = 6
}
(pkgs.Rec) newR2 = {
age = 5
name = "string"
}
(cjdb)
When the debugger stops at a certain point in the program, using locals allows you to see all local variables within the scope of the current function’s lifecycle. Only variables that have been initialized at the current position can be correctly viewed; uninitialized variables cannot be properly inspected.
- Viewing a single variable,
print variable_name
Example: print b
(cjdb) print b
(Int64) $0 = 110
(cjdb)
Use the print command (abbreviated as p), followed by the name of the specific variable to inspect.
- Viewing String type variables
(cjdb) print newR2.name
(String) $0 = "string"
(cjdb)
- Viewing struct and class type variables
(cjdb) print newR2
(pkgs.Rec) $0 = {
age = 5
name = "string"
}
(cjdb)
- Viewing arrays
(cjdb) print array
(Array<Int64>) $0 = {
[0] = 2
[1] = 4
[2] = 6
[3] = 8
}
(cjdb) print array[1..3]
(Array<Int64>) $1 = {
[1] = 4
[2] = 6
}
(cjdb)
Supports viewing basic types (Int8, Int16, Int32, Int64, UInt8, UInt16, UInt32, UInt64, Float16, Float32, Float64, Bool, Unit, Rune).
Supports range viewing. The interval [start..end) is a left-closed, right-open interval. Reverse order is currently not supported.
For illegal intervals or attempting to view ranges on non-array types, an error message will be displayed.
(cjdb) print array
(Array<Int64>) $0 = {
[0] = 0
[1] = 1
}
(cjdb) print array[1..3]
error: unsupported expression
(cjdb) print array[0][0]
error: unsupported expression
- Viewing CString type variables
(cjdb) p cstr
(cro.CString) $0 = "abc"
(cjdb) p cstr
(cro.CString) $1 = null
- Viewing global variables,
globals
(cjdb) globals
(Int64) pkgs.Rec.g_age = 100
(Int64) pkgs.g_var = 123
(cjdb)
When using the print command to view a single global variable, print + package name + variable name is not supported. Only print + variable name is supported. For example, to view the global variable g_age, use the following command:
(cjdb) p g_age
(Int64) $0 = 100
(cjdb)
- Modifying variables
(cjdb) set a=30
(Int32) $4 = 30
(cjdb)
You can use set to modify the value of a local variable. Only basic numeric types are supported (Int8, Int16, Int32, Int64, UInt8, UInt16, UInt32, UInt64, Float32, Float64).
For Bool type variables, you can use the value 0 (false) or non-zero (true) to modify them. For Rune type variables, you can use the corresponding ASCII code to modify them.
(cjdb) set b = 0
(Bool) $0 = false
(cjdb) set b = 1
(Bool) $1 = true
(cjdb) set c = 0x41
(Rune) $2 = 'A'
(cjdb)
If the modified value is non-numeric or exceeds the variable’s range, an error message will be displayed.
(cjdb) p c
(Rune) $0 = 'A'
(cjdb) set c = 'B'
error: unsupported expression
(cjdb) p b
(Bool) $1 = false
(cjdb) set b = true
error: unsupported expression
(cjdb) p u8
(UInt8) $3 = 123
(cjdb) set u8 = 256
error: unsupported expression
(cjdb) set u8 = -1
error: unsupported expression
Cangjie Threads
Supports viewing Cangjie thread id status and frame information. Thread switching is currently not supported.
Displaying threads in the current target process
(cjdb) cjthread list
cjthread id: 1, state: running name: cjthread1
frame #0: 0x000055555557c140 main`ab::main() at varray.cj:16:1
cjthread id: 2, state: pending name: cjthread2
frame #0: 0x00007ffff7d8b9d5 libcangjie-runtime.so`CJ_CJThreadPark + 117
(cjdb)
Stack Backtrace
- Viewing the call stack of a specified Cangjie thread.
(cjdb) cjthread backtrace 1
cjthread #1 state: pending name: cangjie
frame #0: 0x00007ffff7d8b9d5 libcangjie-runtime.so`CJ_CJThreadPark + 117
frame #1: 0x00007ffff7d97252 libcangjie-runtime.so`CJ_TimerSleep + 66
frame #2: 0x00007ffff7d51b5d libcangjie-runtime.so`CJ_MRT_FuncSleep + 33
frame #3: 0x0000555555591031 main`std/sync::sleep(std/time::Duration) + 45
frame #4: 0x0000555555560941 main`default::lambda.0() at complex.cj:9:3
frame #5: 0x000055555555f68b main`default::std/core::Future<Unit>::execute(this=<unavailable>) at future.cj:124:35
frame #6: 0x00007ffff7d514f1 libcangjie-runtime.so`___lldb_unnamed_symbol1219 + 7
frame #7: 0x00007ffff7d4dc52 libcangjie-runtime.so`___lldb_unnamed_symbol1192 + 114
frame #8: 0x00007ffff7d8b09a libcangjie-runtime.so`CJ_CJThreadEntry + 26
(cjdb)
In the cjthread backtrace 1 command, 1 is the specified cjthread ID.
Executable File Debugging
Launch Method
There are two ways to load the target program using the launch method:
-
Start the debugger and load the target program simultaneously.
~/0901/cangjie_test$ cjdb test (cjdb) target create "test" Current executable set to '/0901/cangjie-linux-x86_64-release/bin/test' (x86_64). (cjdb) -
Start the debugger first, then load the target program using the
filecommand.~/0901/cangjie_test$ cjdb (cjdb) file test Current executable set to '/0901/cangjie/test' (x86_64). (cjdb)
Attach Mode
Debugging the target program via attach mode
For already running programs, cjdb supports debugging through attach mode as follows:
~/0901/cangjie-linux-x86_64-release/bin$ cjdb
(cjdb) attach 15325
Process 15325 stopped
* thread #1, name = 'test', stop reason = signal SIGSTOP
frame #0: 0x00000000004014cd test`default.main() at test.cj:7:9
4 var a : Int32 = 12
5 a = a + 23
6 while (true) {
-> 7 a = 1
8 }
9 a = test(10, 34)
10 return 1
thread #2, name = 'FinalProcessor', stop reason = signal SIGSTOP
frame #0: 0x00007f48c12fc065 libpthread.so.0`__pthread_cond_timedwait at futex-internal.h:205
thread #3, name = 'PoolGC_1', stop reason = signal SIGSTOP
frame #0: 0x00007f48c12fbad3 libpthread.so.0`__pthread_cond_wait at futex-internal.h:88
thread #4, name = 'MainGC', stop reason = signal SIGSTOP
frame #0: 0x00007f48c12fc065 libpthread.so.0`__pthread_cond_timedwait at futex-internal.h:205
thread #5, name = 'schmon', stop reason = signal SIGSTOP
frame #0: 0x00007f48c0fe17a0 libc.so.6`__GI___nanosleep(requested_time=0x00007f48a8ffcb70, remaining=0x0000000000000000) at nanosleep.c:28
Executable module set to "/0901/cangjie-linux-x86_64-release/bin/test".
Architecture set to: x86_64-unknown-linux-gnu.
Android Remote Debugging
- To perform remote debugging, first start the
lldb-serveron the Android platform.
adb shell /data/local/tmp/lldb-server platform --listen "*:1234"
- Launch the debugger in a separate window.
(cjdb) platform select remote-android
Platform: remote-android
Connected: no
(cjdb) platform connect connect://FMR0223A31052288:1234
Platform: remote-android
Triple: aarch64-unknown-linux-android
OS Version: 31 (5.10.43)
Hostname: localhost
Connected: yes
WorkingDir: /
Kernel: #1 SMP PREEMPT Wed Mar 20 12:20:52 CST 2024
(cjdb) attach 29551
(cjdb) Process 29551 stopped
* thread #1, name = 'main', stop reason = signal SIGSTOP
frame #0: 0x0000007f83f7c5e4 libc.so`nanosleep + 4
libc.so`nanosleep:
-> 0x7f83f7c5e4 <+4>: svc #0
0x7f83f7c5e8 <+8>: cmn x0, #0x1, lsl #12 ; =0x1000
0x7f83f7c5ec <+12>: cneg x0, x0, hi
0x7f83f7c5f0 <+16>: b.hi 0x7f83f7b6cc ; __set_errno_internal
Executable module set to "C:\Users\user\.lldb\module_cache\remote-android\.cache\88DA3010\main".
Architecture set to: aarch64-unknown-linux-android.
(cjdb)
Select the Android platform for remote debugging.
platform select remote-android
Connect to the Android device for remote debugging.
platform connect connect://FMR0223A31052288:1234
Attach to a process for remote debugging.
attach 29551
iOS Simulator Remote Debugging
Since running programs on the iOS simulator requires launching via Xcode, follow these steps when debugging programs running on the iOS simulator using cjdb:
-
First, launch the target program using
Xcode. -
Use the
detachcommand in theXcode lldbcommand line to disconnect debugging. -
Start
cjdbfrom the command line and use theattachcommand to load the target program.
Once successfully loaded, you can proceed with normal debugging using cjdb. Due to version differences between Xcode and cjdb dependencies on llvm, additional compilation parameters -gdwarf-4 must be added during compilation.
iOS Device Remote Debugging
Due to iOS device security policies, cjdb cannot directly debug programs on physical devices. Debugging must be performed using lldb bundled with Xcode. Therefore, a Python script extension is provided to support debugging Cangjie programs.
To debug programs running on an iOS device, follow these steps. For compilation instructions, refer to the “Compilation and Building” chapter in the Cangjie Programming Language Development Guide.
-
First, launch the target program using
Xcode. -
Load the script in the
Xcodedebug window command line using the command:command script import $CANGJIE_HOME/tools/script/cangjie_cjdb.py, where$CANGJIE_HOMEshould be replaced with the Cangjie installation directory. To automatically load the script every timeXcodestarts debugging, create a.lldbinitfile in the user directory and enter the above command. -
Once successfully loaded, normal debugging can proceed.
Since iOS devices use Python script extension capabilities, the debugging features are limited by the Python capabilities exposed by lldb. Therefore, there are differences in supported debugging features compared to cjdb. Expression evaluation, conditional breakpoints, and demangle functionality are currently unsupported. None will be displayed as nullptr.
Viewing Cangjie Thread Call Stacks
Use the cjthread command to view Cangjie call stacks.
(lldb) cjthread
cjthread #6 state: pending name:
frame #0: 0x7ffff7f0299d libcangjie-runtime.so`CJ_CJThreadPark
frame #1: 0x7ffff7f19f3e libcangjie-runtime.so`CJ_TimerSleep
frame #2: 0x7ffff7e1c95a libcangjie-runtime.so`MRT_Sleep
frame #3: 0x7ffff7e224c1 libcangjie-runtime.so`CJ_MRT_Sleep
frame #4: 0x55555563a5e9 main`std.core.sleep(std.core::Duration) at sleep.cj:36
frame #5: 0x5555555f3e0b main`default.create_spawn::lambda.0() at cjthread.cj:8
frame #6: 0x5555555f3f67 main`_CCN7default12create_spawnHRNat6StringEEL_E$g
frame #7: 0x5555556425ba main`std.core.Future<...>::execute() at future.cj:161
cjthread #5 state: pending name:
frame #0: 0x7ffff7f0299d libcangjie-runtime.so`CJ_CJThreadPark
frame #1: 0x7ffff7f19f3e libcangjie-runtime.so`CJ_TimerSleep
frame #2: 0x7ffff7e1c95a libcangjie-runtime.so`MRT_Sleep
frame #3: 0x7ffff7e224c1 libcangjie-runtime.so`CJ_MRT_Sleep
frame #4: 0x55555563a5e9 main`std.core.sleep(std.core::Duration) at sleep.cj:36
frame #5: 0x5555555f3e0b main`default.create_spawn::lambda.0() at cjthread.cj:8
frame #6: 0x5555555f3f67 main`_CCN7default12create_spawnHRNat6StringEEL_E$g
frame #7: 0x5555556425ba main`std.core.Future<...>::execute() at future.cj:161
Note:
Currently, mixed call stacks (i.e., call stacks containing frames from other languages) cannot be displayed. A maximum of 100 Cangjie thread call stacks can be shown. Each Cangjie thread call stack is limited to displaying 2048 bytes.
Notes
-
The program being debugged must be compiled in
debugversion, such as programs compiled with the following command:cjc -g test.cj -o test -
When a developer defines a generic object and steps into its
initfunction during debugging, the stack information will display two package names: one for the package where the generic object is instantiated, and another for the package where the generic is defined.* thread #1, name = 'main', stop reason = step in frame #0: 0x0000000000404057 main`default.p1.Pair<String, Int64>.init(a="hello", b=0) at a.cj:21:9 18 let x: T 19 let y: U 20 public init(a: T, b: U) { -> 21 x = a 22 y = b 23 } -
For displaying
Enumtypes, if the enum constructor has parameters, it will be displayed as follows:enum E { Ctor(Int64, String) | Ctor } main() { var temp = E.Ctor(10, "String") 0 } ======================================== (cjdb) p temp (E) $0 = Ctor { arg_1 = 10 arg_2 = "String" }Note that
arg_xis not an actual printable member variable - theEnumdoes not actually contain member variables namedarg_x. -
The Cangjie
cjdbis built uponlldb, thus supporting all native basic functionalities oflldb. For details, refer to the lldb official documentation. -
If developers run cjdb on system environments newer than the version of
llvm15.0.4, compatibility issues and risks may arise, such as in C language interoperability scenarios wherecjdbcannot properly resolve C code file and line number information.int32_t cfoo() { printf("cfoo\n"); return 0; }foreign func cfoo(): Int32 unsafe main() { cfoo() }# step 1: Compile C file using system's native clang version to generate dylib clang -g -shared cffi.c -o libcffi.dylib # step 2: Compile CJ file using cjc and link with C dynamic library cjc -g test.cj -L. -lcffi -o test # step 3: Debug test file using cjdb (C code debugging fails due to incompatible debug info) cjdb test(cjdb) target create "test" Current executable set to 'test' (x86_64). (cjdb) b cfoo Breakpoint 1: where = libcffi.dylib`cfoo + 4, address = 0x0000000000000f84 (cjdb) r Process 3133 launched: 'test' (x86_64) Process 3133 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 frame #0: 0x00000001000a6f84 libcffi.dylib`cfoo 1 foreign func cfoo(): Int32 2 unsafe main() { 3 cfoo() -> 4 }
FAQ
-
cjdb reports
error: process launch failed: 'A' packet returned an error: 8indockerenvironment.root@xxx:/home/cj/cangjie-example#cjdb ./hello (cjdb) target create "./hello" Current executable set to '/home/cj/cangjie-example/hello' (x86_64). (cjdb) b main Breakpoint 1: 2 locations. (cjdb) r error: process launch failed: 'A' packet returned an error: 8 (cjdb)Cause: The container was created without SYS_PTRACE capability.
Solution: Create new container with following options and remove existing container:
docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --security-opt apparmor=unconfined -
cjdb reports
stop reason = signal XXX.Process 32491 stopped * thread #2, name = 'PoolGC_1', stop reason = signal SIGABRT frame #0: 0x00007ffff450bfb7 lib.so.6`__GI_raise(sig=2) at raise.c:51Cause: The program continuously generates
SIGABRTsignals triggering debugger pauses.Solution: Execute following command to ignore such signals:
(cjdb) process handle --pass true --stop false --notify true SIGBUS NAME PASS STOP NOTIFY =========== ===== ===== ====== SIGBUS true false true (cjdb) -
cjdb doesn’t catch
SIGSEGVsignals.Cause: cjdb is configured by default not to catch
SIGSEGVsignals.Solution: Developers needing to catch this signal during debugging can reconfigure with:
(cjdb)process handle -p true -s true -n true SIGSEGV NAME PASS STOP NOTIFY =========== ===== ===== ====== SIGSEGV true true true (cjdb) -
cjdb cannot step into
catchblocks usingnext/scommands.Cause: Cangjie uses
LandingPadmechanism for exception handling, which cannot deterministically identify whichcatchblock will handle exceptions fromtryblocks. Similar issues exist inclang++.Solution: Developers needing to debug
catchblocks should set breakpoints within them.(cjdb) b 31 Breakpoint 2: where = main`default::test(Int64) + 299 at a.cj:31:18, address = 0x000055555557caff (cjdb) n Process 1761640 stopped * thread #1, name = 'schmon', stop reason = breakpoint 2.1 frame #0: 0x000055555557caff main`default::test(a=0) at a.cj:31:18 28 s = 12/a 29 } catch (e:Exception) { 30 ->31 error_result = e.toString() 32 println(error_result) 33 } 34 s (cjdb) -
Expression evaluation error on
macOS:Expression can't be run, because there is no JIT compiled function.Cause: Expression evaluation is currently unsupported on
macOSplatform. -
On
macOSaarch64 architecture, some environments reportConnection shut down by remote side while waiting for reply to initial handshake packetduring expression evaluation.Cause: Some systems cause debug service abnormal termination.
Solution: Delete
third_party/llvm/bin/debugserverfile and restart debugging. -
When setting breakpoints involving generic type parameters, the parameter names appear as T0, T1, … Tn. Example:
func global_func_02<K, G>() { 0 } public struct Pair<T, U> { let x: T let y: U public init(a: T, b: U) { x = a y = b } } main() { var a: Pair<String, Int64> = Pair<String, Int64>("hello", 0) global_func_02<Int64, String>() 0 } ======================================== (cjdb) b 1 Breakpoint 1: where = main`default::global_func_02<T0,T1>() + 9 at test.cj:1:33, address = 0x0000000000019989 (cjdb) b 6 Breakpoint 2: where = main`default::Pair<T0,T1>::init(T0, T1) + 150 at test.cj:6:9, address = 0x000000000001982aCause: Cangjie maintains ABI compatibility for generic type parameters - when developer-side generic parameter names change, the symbol names in Cangjie’s binary symbol table remain unchanged.
Solution: Modify developer-written generic parameter names to T0, T1, … Tn.
Formatting Tool
Feature Overview
cjfmt (Cangjie Formatter) is an automatic code formatting tool developed based on the Cangjie language programming specifications.
Usage Instructions
Use the command line operation cjfmt [option] file [option] file
cjfmt -h displays help information and option descriptions.
Usage:
cjfmt -f fileName [-o fileName] [-l start:end]
cjfmt -d fileDir [-o fileDir]
Options:
-h Show usage
eg: cjfmt -h
-v Show version
eg: cjfmt -v
-f Specifies the file to be formatted. The value can be a relative path or an absolute path.
eg: cjfmt -f test.cj
-d Specifies the directory containing files to be formatted. The value can be a relative path or an absolute path.
eg: cjfmt -d test/
-o <value> Output. For single file formatting, '-o' is followed by the output file name (supports relative/absolute paths).
For directory formatting, a path must be specified after -o (supports relative/absolute paths).
eg: cjfmt -f a.cj -o ./fmta.cj
eg: cjfmt -d ~/testsrc -o ./testout
-c <value> Specifies the formatting configuration file (supports relative/absolute paths).
If the specified config file fails to load, cjfmt will attempt to read the default config file from CANGJIE_HOME.
If the default config also fails, built-in configurations will be used.
eg: cjfmt -f a.cj -c ./config/cangjie-format.toml
eg: cjfmt -d ~/testsrc -c ~/home/project/config/cangjie-format.toml
-l <region> Only formats lines within the specified region of the provided file (only valid for single file formatting).
Region format: [start:end] where 'start' and 'end' are integers representing first/last lines to format (line count starts at 1).
eg: cjfmt -f a.cj -o ./fmta.cj -l 1:25
File Formatting
cjfmt -f
- Format and overwrite source file (supports relative/absolute paths):
cjfmt -f ../../../test/uilang/Thread.cj
- Option
-ocreates a new.cjfile with formatted output (supports relative/absolute paths for both source and output):
cjfmt -f ../../../test/uilang/Thread.cj -o ../../../test/formated/Thread.cj
Directory Formatting
cjfmt -d
- Option
-dspecifies a directory of Cangjie source files to format (supports relative/absolute paths):
cjfmt -d test/ # Relative path source directory
cjfmt -d /home/xxx/test # Absolute path source directory
- Option
-ospecifies output directory (can be existing or new; supports relative/absolute paths). Note: MAX_PATH length varies by system (e.g., typically ≤260 on Windows, ≤4096 recommended on Linux):
cjfmt -d test/ -o /home/xxx/testout
cjfmt -d /home/xxx/test -o ../testout/
cjfmt -d testsrc/ -o /home/../testout # Error if source directory doesn't exist: "error: Source file path not exist!"
Formatting Configuration
cjfmt -c
- Option
-callows specifying a custom formatting configuration file:
cjfmt -f a.cj -c ./cangjie-format.toml
Default cangjie-format.toml configuration (also represents built-in defaults):
# indent width
indentWidth = 4 # Range of indentWidth: [0, 8]
# limit length
linelimitLength = 120 # Range of indentWidth: [1, 120]
# line break type
lineBreakType = "LF" # "LF" or "CRLF"
# allow Multi-line Method Chain when it's level equal or greater than multipleLineMethodChainLevel
allowMultiLineMethodChain = false
# if allowMultiLineMethodChain's value is true,
# and method chain's level is equal or greater than multipleLineMethodChainLevel,
# method chain will be formatted to multi-line method chain.
# e.g. A.b().c() level is 2, A.b().c().d() level is 3
# ObjectA.b().c().d().e().f() =>
# ObjectA
# .b()
# .c()
# .d()
# .e()
# .f()
multipleLineMethodChainLevel = 5 # Range of multipleLineMethodChainLevel: [2, 10]
# allow Multi-line Method Chain when it's length greater than linelimitLength
multipleLineMethodChainOverLineLength = true
Note:
If custom config file fails to load, the tool attempts to read default
cangjie-format.tomlfrom CANGJIE_HOME. If default config also fails, built-in formatting options are used. If any config option fails to load, the built-in default for that option is used.
Partial Formatting
cjfmt -l
- Option
-lformats only specified line ranges in a file (only works with single file formatting via-f; invalid with directory-doption):
cjfmt -f a.cj -o b.cj -l 10:25 // Formats only lines 10-25
Formatting Rules
- Source files should sequentially contain copyright, package, import, and top-level elements, separated by blank lines.
【Correct Example】
// Part 1: Copyright
/*
* Copyright (c) [Year of First Pubication]-[Year of Latest Update]. [Company Name]. All rights reserved.
*/
// Part 2: Package declaration
package com.myproduct.mymodule
// Part 3: Imports
import std.collection.HashMap // Standard library
// Part 4: Public elements
public class ListItem <: Component {
// CODE
}
// Part 5: Internal elements
class Helper {
// CODE
}
Note:
The formatter doesn’t enforce blank lines after copyright, but preserves one blank line if present.
- Consistent 4-space indentation.
【Correct Example】
class ListItem {
var content: Array<Int64> // Correct: 4-space indent relative to class
init(
content: Array<Int64>, // Correct: 4-space indent for parameters
isShow!: Bool = true,
id!: String = ""
) {
this.content = content
}
}
- Uniform brace style (K&R for non-empty blocks).
【Correct Example】
enum TimeUnit { // Correct: Opening brace on same line with 1 preceding space
Year | Month | Day | Hour
} // Correct: Closing brace on own line
class A { // Correct: Opening brace on same line
var count = 1
}
func fn(a: Int64): Unit { // Correct: Opening brace on same line
if (a > 0) { // Correct: Opening brace on same line
// CODE
} else { // Correct: Closing brace and 'else' on same line
// CODE
} // Correct: Closing brace on own line
}
// Lambda functions
let add = {
base: Int64, bonus: Int64 => // Correct: Lambda follows K&R style
print("Correct news")
base + bonus
}
- Use spaces to highlight keywords per specification G.FMT.10.
【Correct Example】
var isPresent: Bool = false // Correct: Space after colon
func method(isEmpty!: Bool): RetType { ... } // Correct: Space after colon in params/return
method(isEmpty: isPresent) // Correct: Space after colon in named args
0..MAX_COUNT : -1 // Correct: No spaces around range operator, spaces around step colon
var hundred = 0
do { // Correct: Space between 'do' and brace
hundred++
} while (hundred < 100) // Correct: Space between 'while' and paren
func fn(paramName1: ArgType, paramName2: ArgType): ReturnType { // Correct: No inner paren spaces
...
for (i in 1..4) { // Correct: No spaces around range operator
...
}
}
let listOne: Array<Int64> = [1, 2, 3, 4] // Correct: No inner bracket/paren spaces
let salary = base + bonus // Correct: Spaces around binary operators
x++ // Correct: No space for unary operators
- Minimize unnecessary blank lines for compact code.
【Incorrect Example】
class MyApp <: App {
let album = albumCreate()
let page: Router
// Blank line
// Blank line
// Blank line
init() { // Incorrect: Consecutive blank lines in type
this.page = Router("album", album)
}
override func onCreate(): Unit {
println( "album Init." ) // Incorrect: Blank lines inside braces
}
}
- Remove unnecessary semicolons for conciseness.
【Before Formatting】
package demo.analyzer.filter.impl; // Redundant semicolon
internal import demo.analyzer.filter.StmtFilter; // Redundant semicolon
internal import demo.analyzer.CJStatment; // Redundant semicolon
func fn(a: Int64): Unit {
println( "album Init." );
}
【After Formatting】
package demo.analyzer.filter.impl // Redundant semicolon removed
internal import demo.analyzer.filter.StmtFilter // Redundant semicolon removed
internal import demo.analyzer.CJStatment // Redundant semicolon removed
func fn(a: Int64): Unit {
println("album Init.");
}
- Modifier keyword ordering per specification G.FMT.12.
Recommended top-level element modifier priority:
public
open/abstract
Recommended instance member function/property modifier priority:
public/protected/private
open
override
Recommended static member function modifier priority:
public/protected/private
static
redef
Recommended member variable modifier priority:
public/protected/private
static
- Multi-line comment formatting
Comments starting with * will align the * characters. Other comments preserve original formatting. Excess spaces after * are removed.
// Before formatting
/*
* comment
*/
/*
comment
*/
// After formatting
/*
* comment
*/
/*
comment
*/
Important Notes
-
The Cangjie formatter currently doesn’t support formatting syntactically incorrect code.
-
The Cangjie formatter currently doesn’t support metaprogramming formatting.
HLE Tool User Guide
Introduction
HLE (HyperlangExtension) is a tool for automatically generating interoperability code templates for Cangjie calling ArkTS or C language.
The input of this tool is the interface declaration file of ArkTS or C language, such as files ending with .d.ts, .d.ets or .h, and the output is a cj file, which stores the generated interoperability code. If the generated code is a glue layer code from ArkTS to Cangjie, the tool will also output a json file containing all the information of the ArkTS file. For the conversion rules from ArkTS to Cangjie, please refer to: ArkTS Third-Party Module Generation Cangjie Glue Code Rules. For the conversion rules from C language to Cangjie, please refer to: C Language Conversion to Cangjie Glue Code Rules.
Instructions
Dependencies
-
This tool requires Node.js for execution:
Recommended version: v18.14.1 or higher. Lower versions may fail to parse certain ArkTS syntax, so using the latest version is advised.
For example, use the following commands:
# Download and install nvm: curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.3/install.sh | bash # in lieu of restarting the shell \. "$HOME/.nvm/nvm.sh" # Download and install Node.js: nvm install 22 # Verify the Node.js version: node -v # Should print "v22.17.1". nvm current # Should print "v22.17.1". # Verify npm version: npm -v # Should print "10.9.2". -
This tool requires TypeScript and cjbind for execution:
After installing Node.js, use the following commands to install TypeScript and cjbind:
cd ${CANGJIE_HOME}/tools/dtsparser/ npm install
Parameter Meaning
| Parameter | Meaning | Parameter Type | Description |
|---|---|---|---|
-i | Absolute path of d.ts, d.ets or .h file input | Optional | Choose one from -d or both |
-r | Absolute path of typescript compiler | Required | Used only when generating ArkTS Cangjie bindings |
-d | Absolute path of the folder where d.ts, d.ets or .h file input is located | Optional | Choose one from -i or both |
-o | Directory to save the output interoperability code | Optional | Output to the current directory by default |
-j | Path to analyze d.t or d.ets files | Optional | Used only when generating ArkTS Cangjie bindings |
--module-name | Custom generated Cangjie package name | Optional | NA |
--lib | Generate third-party library code | Optional | Used only when generating ArkTS Cangjie bindings |
-c | Generate C to Cangjie binding code | Optional | Used only when generating C language Cangjie bindings |
-b | Specify the path of the cjbind binary | Optional | Used only when generating C language Cangjie bindings |
--clang-args | Parameters that will be directly passed to clang | Optional | Used only when generating C language Cangjie bindings |
--no-detect-include-path | Disable automatic include path detection | Optional | Used only when generating C language Cangjie bindings |
--help | Help option | Optional | NA |
Command Line
You can use the following command to generate ArkTS to Cangjie binding code:
hle -i /path/to/test.d.ts -o out –j ${CANGJIE_HOME}/tools/dtsparser/analysis.js --module-name="my_module"
In the Windows environment, the file directory currently does not support the symbol “\”, only “/” is supported.
The command to generate C to Cangjie binding code is as follows:
hle -b ${CANGJIE_HOME}/tools/dtsparser/node_modules/.bin/cjbind -c --module-name="my_module" -d ./tests/c_cases -o ./tests/expected/c_module/ --clang-args="-I/usr/lib/llvm-20/lib/clang/20/include/"
The -b parameter is used to specify the path to the cjbind binary file. The cjbind download link is as follows:
- Linux: https://gitcode.com/Cangjie-SIG/cjbind-cangjie/releases/download/v0.2.9/cjbind-linux-x64
- Windows: https://gitcode.com/Cangjie-SIG/cjbind-cangjie/releases/download/v0.2.9/cjbind-windows-x64.exe
- macOS: https://gitcode.com/Cangjie-SIG/cjbind-cangjie/releases/download/v0.2.9/cjbind-darwin-arm64
The --clang-args parameter is directly passed to clang, and the -I option can be used within its value to specify header file search paths. System header file paths are searched automatically by the program, while user-defined header file paths need to be explicitly specified.
ArkTS Third-Party Module Generation Rules for Cangjie Glue Code
Top-Level Declarations
| .d.ts | Supported Scope | Specifications |
|---|---|---|
| Namespace | None | Not supported |
| Global Functions | Supports overloading, supports generic functions | |
| Global Variables | Requires manual modification to correct initialization values | Does not support generic type global variables |
| Interfaces | Supports basic type interfaces, optional properties, readonly properties, member functions, generics, function overloading, array types, inheritance, nested objects | Does not support index signatures, dynamic properties, function types, constructors, declaration merging |
| Type Aliases | Supports enum type aliases, class type aliases, function type aliases, union type aliases | Does not support object literal type aliases, type aliases within namespaces, intersection type aliases, generic type aliases |
| Classes | Supports constructors, static members, private members, protected members, private properties, generic members, abstract classes, class implementing interfaces, class inheritance, overloaded methods | Does not support decorated classes, types with namespaces |
| Enums | Supports string enums, numeric enums, const enums, heterogeneous enums | Does not support computed value enums. In heterogeneous enums, enum values will be uniformly converted to string type, requiring manual conversion during usage |
| Imports | Supported | |
| Exports | None | Not supported |
Namespace
Currently not supported
Global Functions
- Supports overloading.
- Parameter and return value types supported: basic types, function types, tuple types, optional types, generic functions.
- Union types (parameters of union type will be mapped to multiple type overloads).
Example:
.d.ts code:
declare function greeter(fn: (a: string) => void): void;
declare function printToConsole(s: string): void;
Generated Cangjie code:
import ohos.ark_interop.*
import ohos.ark_interop_helper.*
import ohos.base.*
/***********METHOD***********/
/**
* @brief greeter(fn: (a: string) => void): void
*/
public func greeter(fn: (a: String) -> Unit): Unit {
hmsGlobalApiCall < Unit >( "_ark_interop_api", "greeter", { ctx =>[ctx.function({ ctx, info =>
let p0 = String.fromJSValue(ctx, info[0])
fn(p0)
ctx.undefined().toJSValue()
}).toJSValue()] })
}
/**
* @brief printToConsole(s: string): void
*/
public func printToConsole(s: String): Unit {
hmsGlobalApiCall < Unit >( "_ark_interop_api", "printToConsole", { ctx =>[s.toJSValue(ctx)] })
}
Generic function example:
.d.ts code:
declare function testMultiGenericT<T, M>(t: T, m: M): T;
Generated Cangjie code:
/**
* @brief testMultiGenericT(t: T, m: M): T
*/
public func testMultiGenericT < T, M >(t: T, m: M): T where T <: JSInteropType<T>, M <: JSInteropType<M> {
hmsGlobalApiCall < T >( "my_module_genericFunction", "testMultiGenericT", { ctx =>[t.toJSValue(ctx), m.toJSValue(ctx)] }) {
ctx, info => T.fromJSValue(ctx, info)
}
}
Global Variables
- Since global variable declarations do not include initial values, the generated Cangjie code requires users to complete the initialization values.
Example:
.d.ts code:
declare var foo: number;
declare const goo: number;
declare let qoo: number;
Generated Cangjie code:
public const foo = !!!!!check in dts!!!!!
public const goo = !!!!!check in dts!!!!!
public const qoo = !!!!!check in dts!!!!!
Interfaces
- Supports basic types, optional properties, readonly properties, member functions, generics, function overloading, array types.
- Does not support index signatures, inheritance, dynamic properties, nested objects, function types, constructors, declaration merging.
Basic Types
.d.ts code:
interface GreetingSettings {
greeting: string;
duration?: number;
color?: string;
}
Generated Cangjie code:
public class GreetingSettings {
protected GreetingSettings(public var greeting: String,
public var duration!: Option<Float64> = None,
public var color!: Option<String> = None) {}
public func toJSValue(context: JSContext): JSValue {
let obj = context.object()
obj["greeting"] = greeting.toJSValue(context)
if(let Some(v) <- duration) {
obj["duration"] = v.toJSValue(context)
}
if(let Some(v) <- color) {
obj["color"] = v.toJSValue(context)
}
obj.toJSValue()
}
public static func fromJSValue (context: JSContext, input: JSValue): GreetingSettings {
let obj = input.asObject()
GreetingSettings(
String.fromJSValue(context, obj["greeting"]),
duration: if(obj["duration"].isUndefined()) {
None
} else {
Float64.fromJSValue(context, obj["duration"])
},
color: if(obj["color"].isUndefined()) {
None
} else {
String.fromJSValue(context, obj["color"])
}
)
}
}
Optional Properties
.d.ts code:
// product.d.ts
interface Product {
price?: number; // Optional property
}
Generated Cangjie code:
public class Product {
protected Product(public var price!: Option<Float64> = None) {}
public func toJSValue(context: JSContext): JSValue {
let obj = context.object()
if(let Some(v) <- price) {
obj["price"] = v.toJSValue(context)
}
obj.toJSValue()
}
public static func fromJSValue (context: JSContext, input: JSValue): Product {
let obj = input.asObject()
Product(
price: if(obj["price"].isUndefined()) {
None
} else {
Float64.fromJSValue(context, obj["price"])
}
)
}
}
Readonly Properties
.d.ts code:
// point.d.ts
interface Point {
readonly x: number;
readonly y: number;
}
Generated Cangjie code:
public class Point {
protected Point(public let x: Float64,
public let y: Float64) {}
public func toJSValue(context: JSContext): JSValue {
let obj = context.object()
obj["x"] = x.toJSValue(context)
obj["y"] = y.toJSValue(context)
obj.toJSValue()
}
public static func fromJSValue (context: JSContext, input: JSValue): Point {
let obj = input.asObject()
Point(
Float64.fromJSValue(context, obj["x"]),
Float64.fromJSValue(context, obj["y"])
)
}
}
Function Types
.d.ts code:
// callback.d.ts
interface Callback {
(data: string): void;
}
Currently not supported
Member Functions
.d.ts code:
// person.d.ts
interface Person {
name: string;
greet(): string;
}
Generated Cangjie code:
public class Person {
protected Person(let arkts_object: JSObject) {}
public mut prop name: String {
get() {
checkThreadAndCall < String >(getMainContext()) {
ctx: JSContext => String.fromJSValue(ctx, arkts_object["name"])
}
}
set(v) {
checkThreadAndCall < Unit >(getMainContext()) {
ctx: JSContext => arkts_object["name"] = v.toJSValue(ctx)
}
}
}
/**
* @brief greet(): String
*/
public func greet(): String {
jsObjApiCall < String >( arkts_object, "greet", emptyArg)
}
func toJSValue(context: JSContext): JSValue {
arkts_object.toJSValue()
}
static func fromJSValue (context: JSContext, input: JSValue): Person {
Person(input.asObject())
}
}
Function Overloading
.d.ts code:
// calculator.d.ts
interface Calculator {
add(x: number, y: number): number;
add(x: string, y: string): string;
}
Generated Cangjie code:
public class Calculator {
protected Calculator(let arkts_object: JSObject) {}
/**
* @brief add(x: number,y: number): number
*/
public func add(x: Float64, y: Float64): Float64 {
jsObjApiCall < Float64 >( arkts_object, "add", { ctx =>[x.toJSValue(ctx), y.toJSValue(ctx)] })
}
/**
* @brief add(x: string,y: string): String
*/
public func add(x: String, y: String): String {
jsObjApiCall < String >( arkts_object, "add", { ctx =>[x.toJSValue(ctx), y.toJSValue(ctx)] })
}
func toJSValue(context: JSContext): JSValue {
arkts_object.toJSValue()
}
static func fromJSValue (context: JSContext, input: JSValue): Calculator {
Calculator(input.asObject())
}
}
Array Types
.d.ts code:
// list.d.ts
interface List {
items: string[];
add(item: string): void;
}
Generated Cangjie code:
public class List {
protected List(let arkts_object: JSObject) {}
public mut prop items: Array<String> {
get() {
checkThreadAndCall < Array<String> >(getMainContext()) {
ctx: JSContext => Array<String>.fromJSValue(ctx, arkts_object["items"])
}
}
set(v) {
checkThreadAndCall < Unit >(getMainContext()) {
ctx: JSContext => arkts_object["items"] = v.toJSValue(ctx)
}
}
}
/**
* @brief add(item: string): void
*/
public func add(item: String): Unit {
jsObjApiCall < Unit >( arkts_object, "add", { ctx =>[item.toJSValue(ctx)] })
}
func toJSValue(context: JSContext): JSValue {
arkts_object.toJSValue()
}
static func fromJSValue (context: JSContext, input: JSValue): List {
List(input.asObject())
}
}
Inheritance
.d.ts code:
interface A {
p: number;
}
interface B extends A {
p1: number
}
interface C {
f(): void
}
interface D extends C {
}
interface E extends A {
}
interface F extends C {
g(): void
}
Generated Cangjie code:
public open class A {
protected A(public var p: Float64) {}
public open func toJSValue(context: JSContext): JSValue {
let obj = context.object()
obj["p"] = p.toJSValue(context)
obj.toJSValue()
}
public static func fromJSValue(context: JSContext, input: JSValue): A {
let obj = input.asObject()
A(
Float64.fromJSValue(context, obj["p"])
)
}
}
/*interface B {
p1: number;
}*/
public open class B <: A {
protected B(p: Float64,
public var p1: Float64) { super(p) }
public open func toJSValue(context: JSContext): JSValue {
let obj = context.object()
obj["p"] = p.toJSValue(context)
obj["p1"] = p1.toJSValue(context)
obj.toJSValue()
}
public static func fromJSValue(context: JSContext, input: JSValue): B {
let obj = input.asObject()
B(
Float64.fromJSValue(context, obj["p"]),
Float64.fromJSValue(context, obj["p1"])
)
}
}
/*interface C {
f(): void
}*/
public open class C {
protected C(public var arkts_object: JSObject) {}
/**
* @brief f(): void
*/
public func f(): Unit {
jsObjApiCall < Unit >( arkts_object, "f", emptyArg)
}
public open func toJSValue(context: JSContext): JSValue {
arkts_object.toJSValue()
}
static func fromJSValue(context: JSContext, input: JSValue): C {
C(input.asObject())
}
}
/*interface D {
}*/
public open class D <: C {
protected D(arkts_object: JSObject) { super(arkts_object) }
public open func toJSValue(context: JSContext): JSValue {
arkts_object.toJSValue()
}
static func fromJSValue(context: JSContext, input: JSValue): D {
D(input.asObject())
}
}
/*interface E {
}*/
public open class E <: A {
protected E(p: Float64) { super(p) }
public open func toJSValue(context: JSContext): JSValue {
let obj = context.object()
obj["p"] = p.toJSValue(context)
obj.toJSValue()
}
public static func fromJSValue(context: JSContext, input: JSValue): E {
let obj = input.asObject()
E(
Float64.fromJSValue(context, obj["p"])
)
}
}
/*interface F {
g(): void
}*/
public open class F <: C {
protected F(arkts_object: JSObject) { super(arkts_object) }
/**
* @brief g(): void
*/
public func g(): Unit {
jsObjApiCall < Unit >( arkts_object, "g", emptyArg)
}
public open func toJSValue(context: JSContext): JSValue {
arkts_object.toJSValue()
}
static func fromJSValue(context: JSContext, input: JSValue): F {
F(input.asObject())
}
}
Nested Objects
.d.ts code:
// userProfile.d.ts
interface UserProfile {
id: number;
name: string;
address: {
city: string;
zipCode: string;
};
}
Generated Cangjie code:
public open class AutoGenType0 {
protected AutoGenType0(public var : String,
public var : String) {}
public open func toJSValue(context: JSContext): JSValue {
let obj = context.object()
obj["city"] = .toJSValue(context)
obj["zipCode"] = .toJSValue(context)
obj.toJSValue()
}
public static func fromJSValue(context: JSContext, input: JSValue): AutoGenType0 {
let obj = input.asObject()
AutoGenType0(
String.fromJSValue(context, obj["city"]),
String.fromJSValue(context, obj["zipCode"])
)
}
}
public open class UserProfile {
protected UserProfile(public var id: Float64,
public var name: String,
public var address: AutoGenType0) {}
public open func toJSValue(context: JSContext): JSValue {
let obj = context.object()
obj["id"] = id.toJSValue(context)
obj["name"] = name.toJSValue(context)
obj["address"] = address.toJSValue(context)
obj.toJSValue()
}
public static func fromJSValue(context: JSContext, input: JSValue): UserProfile {
let obj = input.asObject()
UserProfile(
Float64.fromJSValue(context, obj["id"]),
String.fromJSValue(context, obj["name"]),
AutoGenType0.fromJSValue(context, obj["address"])
)
}
}
Currently not supported
Index Signatures
.d.ts code:
// dictionary.d.ts
interface Dictionary {
[key: string]: string;
}
// Usage
const dict: Dictionary = {
name: 'Alice',
job: 'Developer',
};
console.log(dict['name']); // Alice
Currently not supported
Dynamic Properties
.d.ts code:
// config.d.ts
interface Config {
[key: string]: string | number;
}
Currently not supported
Constructors
.d.ts code:
interface ClockConstructor {
new (hour: number, minute: number): ClockInterface;
}
Currently not supported
Type Aliases
- Supports enum type aliases, class type aliases, function type aliases, and union type aliases.
- Does not support object literal type aliases, type aliases for types within namespaces, intersection type aliases, or generic type aliases.
Object Literal Type Aliases
.d.ts code:
type AppConfig = {
apiUrl: string;
timeout: number;
};
declare const config: AppConfig;
Currently not supported
Enum Type Aliases
.d.ts code:
declare enum Colors {
Red = 'RED',
Green = 'GREEN',
Blue = 'BLUE',
}
type ColorAlias = Colors;
Generated Cangjie code:
public type ColorAlias = Colors
Class Type Aliases
.d.ts code:
declare class Animal {
name: string;
constructor(name: string);
speak(): void;
}
type AnimalAlias = Animal;
Generated Cangjie code:
public type AnimalAlias = Animal
Type Aliases for Types Within Namespaces
.d.ts code:
declare namespace Shapes {
type Circle = { radius: number };
type Rectangle = { width: number; height: number };
}
type CircleAlias = Shapes.Circle;
type RectangleAlias = Shapes.Rectangle;
Currently not supported
Function Type Aliases
.d.ts code:
type MathOperation = (a: number, b: number) => number;
Generated Cangjie code:
public type MathOperation = (a: Float64, b: Float64) -> Float64
Union Type Aliases
.d.ts code:
type GreetingLike = string | number;
Generated Cangjie code:
public enum GreetingLike {
| STRING(String)
| NUMBER(Float64)
public func toJSValue(context: JSContext): JSValue {
match(this) {
case STRING(x) => context.string(x).toJSValue()
case NUMBER(x) => context.number(x).toJSValue()
}
}
}
Intersection Type Aliases
.d.ts code:
// point.d.ts
interface Point {
readonly x: number;
readonly y: number;
}
Currently not supported
Generic Type Aliases
.d.ts code:
// point.d.ts
interface Point {
readonly x: number;
readonly y: number;
}
Currently not supported
Classes
- Supports constructors, static members, private members, protected members, private properties, generic members, abstract classes, class implementation of interfaces, class inheritance, and method overloading.
- Does not support index signatures, inheritance, dynamic properties, nested objects, function types, or constructors.
Constructors
.d.ts code:
declare class Greeter {
constructor(greeting: string);
greeting: string;
showGreeting(): void;
}
Generated Cangjie code:
public class Greeter {
protected Greeter(let arkts_object: JSObject) {}
/**
* @brief constructor(greeting: string): void
*/
public init(greeting: String) {
arkts_object = checkThreadAndCall < JSObject >(getMainContext()) {
__ctx =>
let clazz = __ctx.global["Greeter"].asClass(__ctx)
clazz.new(greeting.toJSValue(__ctx)).asObject()
}
}
public mut prop greeting: String {
get() {
checkThreadAndCall < String >(getMainContext()) {
ctx: JSContext => String.fromJSValue(ctx, arkts_object["greeting"])
}
}
set(v) {
checkThreadAndCall < Unit >(getMainContext()) {
ctx: JSContext => arkts_object["greeting"] = v.toJSValue(ctx)
}
}
}
/**
* @brief showGreeting(): void
*/
public func showGreeting(): Unit {
jsObjApiCall < Unit >( arkts_object, "showGreeting", emptyArg)
}
func toJSValue(context: JSContext): JSValue {
arkts_object.toJSValue()
}
static func fromJSValue (context: JSContext, input: JSValue): Greeter {
Greeter(input.asObject())
}
}
Static Members
.d.ts code:
// MathUtils.d.ts
declare class MathUtils {
// Static property
static PI: number;
// Static method
static square(x: number): number;
}
Generated Cangjie code:
public class MathUtils {
protected MathUtils(let arkts_object: JSObject) {}
// Static property
public mut prop PI: Float64 {
get() {
checkThreadAndCall < Float64 >(getMainContext()) {
ctx: JSContext => Float64.fromJSValue(ctx, getClassConstructorObj("test", "MathUtils")["PI"])
}
}
set(v) {
checkThreadAndCall < Unit >(getMainContext()) {
ctx: JSContext => getClassConstructorObj("test", "MathUtils")["PI"] = v.toJSValue(ctx)
}
}
}
/**
* @brief square(x: number): number
*/
public static func square(x: Float64): Float64 {
jsObjApiCall < Float64 >(getClassConstructorObj("test", "MathUtils"), "square", { ctx =>[x.toJSValue(ctx)] })
}
func toJSValue(context: JSContext): JSValue {
arkts_object.toJSValue()
}
static func fromJSValue (context: JSContext, input: JSValue): MathUtils {
MathUtils(input.asObject())
}
}
Private Members
.d.ts code:
declare class Person {
// Private property
private age: number;
}
Generated Cangjie code:
public class Person {
// No need to generate private member properties
protected Person(let arkts_object: JSObject) {}
func toJSValue(context: JSContext): JSValue {
arkts_object.toJSValue()
}
static func fromJSValue (context: JSContext, input: JSValue): Person {
Person(input.asObject())
}
}
Protected Members
.d.ts code:
declare class AnimalProtect {
// Protected property
protected name: string;
// Protected method
protected makeSound(): void;
}
Generated Cangjie code:
public class AnimalProtect {
protected AnimalProtect(let arkts_object: JSObject) {}
// Protected property
public mut prop name: String {
get() {
checkThreadAndCall < String >(getMainContext()) {
ctx: JSContext => String.fromJSValue(ctx, arkts_object["name"])
}
}
set(v) {
checkThreadAndCall < Unit >(getMainContext()) {
ctx: JSContext => arkts_object["name"] = v.toJSValue(ctx)
}
}
}
/**
* @brief makeSound(): void
*/
public func makeSound(): Unit {
jsObjApiCall < Unit >( arkts_object, "makeSound", emptyArg)
}
func toJSValue(context: JSContext): JSValue {
arkts_object.toJSValue()
}
static func fromJSValue (context: JSContext, input: JSValue): AnimalProtect {
AnimalProtect(input.asObject())
}
}
Readonly Properties
.d.ts code:
declare class Car {
// Readonly property
readonly brand: string;
name: string
}
Generated Cangjie code:
public class Car {
protected Car(let arkts_object: JSObject) {}
// Readonly property
public prop brand: String {
get() {
checkThreadAndCall < String >(getMainContext()) {
ctx: JSContext => String.fromJSValue(ctx, arkts_object["brand"])
}
}
}
public mut prop name: String {
get() {
checkThreadAndCall < String >(getMainContext()) {
ctx: JSContext => String.fromJSValue(ctx, arkts_object["name"])
}
}
set(v) {
checkThreadAndCall < Unit >(getMainContext()) {
ctx: JSContext => arkts_object["name"] = v.toJSValue(ctx)
}
}
}
func toJSValue(context: JSContext): JSValue {
arkts_object.toJSValue()
}
static func fromJSValue (context: JSContext, input: JSValue): Car {
Car(input.asObject())
}
}
Generic Members
.d.ts code:
declare class Box<T> {
// Property
value: T;
// Method
getValue(): T;
}
Generated Cangjie code:
public class Box<T> {
protected Box(let arkts_object: JSObject) {}
// Property
public mut prop value: T {
get() {
checkThreadAndCall < T >(getMainContext()) {
ctx: JSContext => T.fromJSValue(ctx, arkts_object["value"])
}
}
set(v) {
checkThreadAndCall < Unit >(getMainContext()) {
ctx: JSContext => arkts_object["value"] = v.toJSValue(ctx)
}
}
}
/**
* @brief getValue(): T
*/
public func getValue(): T {
jsObjApiCall < T >( arkts_object, "getValue", emptyArg) {
ctx, info => T.fromJSValue(ctx, info)
}
}
func toJSValue(context: JSContext): JSValue {
arkts_object.toJSValue()
}
static func fromJSValue <T>(context: JSContext, input: JSValue): Box<T> {
Box(input.asObject())
}
}
Abstract Classes
.d.ts code:
declare abstract class Shape {
// Abstract method
abstract getArea(): number;
}
Generated Cangjie code:
public open class Shape {
protected Shape(let arkts_object: JSObject) {}
/**
* @brief getArea(): number
*/
public func getArea(): Float64 {
jsObjApiCall < Float64 >( arkts_object, "getArea", emptyArg)
}
func toJSValue(context: JSContext): JSValue {
arkts_object.toJSValue()
}
static func fromJSValue (context: JSContext, input: JSValue): Shape {
Shape(input.asObject())
}
}
Class Implementation of Interfaces
.d.ts code:
interface Drivable {
start(): void;
stop(): void;
}
declare class Car implements Drivable {
start(): void;
stop(): void;
}
Generated Cangjie code:
public class Drivable {
protected Drivable(let arkts_object: JSObject) {}
/**
* @brief start(): void
*/
public func start(): Unit {
jsObjApiCall < Unit >( arkts_object, "start", emptyArg)
}
/**
* @brief stop(): void
*/
public func stop(): Unit {
jsObjApiCall < Unit >( arkts_object, "stop", emptyArg)
}
func toJSValue(context: JSContext): JSValue {
arkts_object.toJSValue()
}
static func fromJSValue (context: JSContext, input: JSValue): Drivable {
Drivable(input.asObject())
}
}
public class Car1 <: Drivable {
protected Car1(arkts_object: JSObject) {}
/**
* @brief start(): void
*/
public func start(): Unit {
jsObjApiCall < Unit >( arkts_object, "start", emptyArg)
}
/**
* @brief stop(): void
*/
public func stop(): Unit {
jsObjApiCall < Unit >( arkts_object, "stop", emptyArg)
}
func toJSValue(context: JSContext): JSValue {
arkts_object.toJSValue()
}
static func fromJSValue (context: JSContext, input: JSValue): Car1 {
Car1(input.asObject())
}
}
Class Inheritance
.d.ts code:
// Animal.d.ts
declare class Animal {
name: string;
constructor(name: string);
move(distance: number): void;
}
// Dog.d.ts
declare class Dog extends Animal {
bark(): void;
}
Generated Cangjie code:
public class Animal1 {
protected Animal1(let arkts_object: JSObject) {}
/**
* @brief constructor(name: string): void
*/
public init(name: String) {
arkts_object = checkThreadAndCall < JSObject >(getMainContext()) {
__ctx =>
let module = getJSModule(__ctx, "test", None)
let clazz = module["Animal1"].asClass(__ctx)
clazz.new(name.toJSValue(__ctx)).asObject()
}
}
public mut prop name: String {
get() {
checkThreadAndCall < String >(getMainContext()) {
ctx: JSContext => String.fromJSValue(ctx, arkts_object["name"])
}
}
set(v) {
checkThreadAndCall < Unit >(getMainContext()) {
ctx: JSContext => arkts_object["name"] = v.toJSValue(ctx)
}
}
}
/**
* @brief move(distance: number): void
*/
public func move(distance: Float64): Unit {
jsObjApiCall < Unit >( arkts_object, "move", { ctx =>[distance.toJSValue(ctx)] })
}
func toJSValue(context: JSContext): JSValue {
arkts_object.toJSValue()
}
static func fromJSValue (context: JSContext, input: JSValue): Animal1 {
Animal1(input.asObject())
}
}
public class Dog <: Animal1 {
protected Dog(arkts_object: JSObject) {}
/**
* @brief bark(): void
*/
public func bark(): Unit {
jsObjApiCall < Unit >( arkts_object, "bark", emptyArg)
}
func toJSValue(context: JSContext): JSValue {
arkts_object.toJSValue()
}
static func fromJSValue (context: JSContext, input: JSValue): Dog {
Dog(input.asObject())
}
}
Method Overloading
.d.ts code:
declare class Calculator {
// Method overloading
add(x: number, y: number): number;
add(x: string, y: string): string;
// Implementation
add(x: any, y: any): any;
}
Generated Cangjie code:
public class Calculator {
protected Calculator(let arkts_object: JSObject) {}
/**
* @brief add(x: number,y: number): number
*/
public func add(x: Float64, y: Float64): Float64 {
jsObjApiCall < Float64 >( arkts_object, "add", { ctx =>[x.toJSValue(ctx), y.toJSValue(ctx)] })
}
/**
* @brief add(x: string,y: string): String
*/
public func add(x: String, y: String): String {
jsObjApiCall < String >( arkts_object, "add", { ctx =>[x.toJSValue(ctx), y.toJSValue(ctx)] })
}
/**
* @brief add(x: any,y: any): any
*/
public func add(x: Any, y: Any): any {
jsObjApiCall < any >( arkts_object, "add", { ctx =>[x.toJSValue(ctx), y.toJSValue(ctx)] }) {
ctx, info => any.fromJSValue(ctx, info)
}
}
func toJSValue(context: JSContext): JSValue {
arkts_object.toJSValue()
}
static func fromJSValue (context: JSContext, input: JSValue): Calculator {
Calculator(input.asObject())
}
}
Classes with Decorators
.d.ts code:
// LogClass.d.ts
declare function logClass(target: any): void;
@logClass
declare class MyClass {
name: string;
constructor(name: string);
}
Currently not supported
Classes with Namespaces
.d.ts code:
// Shapes.d.ts
declare namespace Shapes {
class Circle1 {
radius: number;
constructor(radius: number);
getArea(): number;
}
}
Currently not supported
Enumerations
- Supports string enums, numeric enums, const enums, and heterogeneous enums. In Cangjie glue code, all enum values in heterogeneous enums will be converted to string type. Therefore, when developers invoke the glue code, they need to manually convert non-string type enum members (e.g.,
numbertype) in heterogeneous enums to their original types as defined in ArkTS. - Computed value enums are not supported.
String Enums
.d.ts code:
// colors.d.ts
declare enum Colors {
Red = 'RED',
Green = 'GREEN',
Blue = 'BLUE',
}
Generated Cangjie code:
public enum Colors <: ToString & Equatable < Colors > {
| Red
| Green
| Blue
func get(): String {
match(this) {
case Red => "RED"
case Green => "GREEN"
case Blue => "BLUE"
}
}
static func parse(val: String): Colors {
match(val) {
case "RED" => Red
case "GREEN" => Green
case "BLUE" => Blue
case _ => throw IllegalArgumentException("unknown value ${val}")
}
}
static func tryParse(val: ?String): ?Colors {
match(val) {
case Some(v) => parse(v)
case None => None
}
}
public func toString(): String {
get()
}
public override operator func ==(that: Colors): Bool {
match((this, that)) {
case(Red, Red) => true
case(Green, Green) => true
case(Blue, Blue) => true
case _ => false
}
}
public override operator func !=(that: Colors): Bool {
!(this == that)
}
}
Numeric Enums
.d.ts code:
// status.d.ts
declare enum Status {
Pending, // 0
Approved, // 1
Rejected, // 2
}
Generated Cangjie code:
public enum Status <: ToString & Equatable < Status > {
| Pending
| Approved
| Rejected
func get(): Int32 {
match(this) {
case Pending => 0 //todo: please check the value
case Approved => 1 //todo: please check the value
case Rejected => 2 //todo: please check the value
}
}
static func parse(val: Int32): Status {
match(val) {
case 0 => Pending //todo: please check the value
case 1 => Approved //todo: please check the value
case 2 => Rejected //todo: please check the value
case _ => throw IllegalArgumentException("unknown value ${val}")
}
}
static func tryParse(val: ?Int32): ?Status {
match(val) {
case Some(v) => parse(v)
case None => None
}
}
public func toString(): String {
match(this) {
case Pending => "Pending"
case Approved => "Approved"
case Rejected => "Rejected"
}
}
public override operator func ==(that: Status): Bool {
match((this, that)) {
case(Pending, Pending) => true
case(Approved, Approved) => true
case(Rejected, Rejected) => true
case _ => false
}
}
public override operator func !=(that: Status): Bool {
!(this == that)
}
}
Const Enums
.d.ts code:
// constants.d.ts
declare const enum Status {
Pending = 3,
Approved = 4,
Rejected = 5
}
Generated Cangjie code:
public enum Status <: ToString & Equatable < Status > {
| Pending
| Approved
| Rejected
func get(): Int32 {
match(this) {
case Pending => 3
case Approved => 4
case Rejected => 5
}
}
static func parse(val: Int32): Status {
match(val) {
case 3 => Pending
case 4 => Approved
case 5 => Rejected
case _ => throw IllegalArgumentException("unknown value ${val}")
}
}
static func tryParse(val: ?Int32): ?Status {
match(val) {
case Some(v) => parse(v)
case None => None
}
}
public func toString(): String {
match(this) {
case Pending => "Pending"
case Approved => "Approved"
case Rejected => "Rejected"
}
}
public override operator func ==(that: Status): Bool {
match((this, that)) {
case(Pending, Pending) => true
case(Approved, Approved) => true
case(Rejected, Rejected) => true
case _ => false
}
}
public override operator func !=(that: Status): Bool {
!(this == that)
}
}
Heterogeneous Enums
.d.ts code:
// response.d.ts
declare enum Response {
No = 0,
Yes = 'YES',
}
Generated Cangjie code:
public enum Response <: ToString & Equatable < Response > {
| No
| Yes
func get(): String {
match(this) {
case No => "0"
case Yes => "YES"
}
}
static func parse(val: String): Response {
match(val) {
case "0" => No
case "YES" => Yes
case _ => throw IllegalArgumentException("unknown value ${val}")
}
}
static func tryParse(val: ?String): ?Response {
match(val) {
case Some(v) => parse(v)
case None => None
}
}
public func toString(): String {
get()
}
public override operator func ==(that: Response): Bool {
match((this, that)) {
case(No, No) => true
case(Yes, Yes) => true
case _ => false
}
}
public override operator func !=(that: Response): Bool {
!(this == that)
}
}
Type Mapping Relationships
The conversion of ArkTS .d.ts interfaces to interoperable Cangjie code supports the following type conversions: basic types, Array types, function types, Optional types, Object types, tuple types, Union types, and Promise types.
For unsupported types, they will default to the JSValue type, accompanied by a FIXME comment (containing the original type from the .d.ts declaration). A warning message will also be printed in the command line indicating the unsupported type.
-
Comment format:
.d.ts code:
type TA80 = undefined;Corresponding Cangjie code:
// The corresponding Cangjie code type is JSValue, with a FIXME comment containing the original .d.ts declared type public type TA80 = JSValue/* FIXME: `undefined` */ -
Warning message format:
WARNING: type is not supported - undefined
Basic Types
Supported data types:
| ArkTS Type | Cangjie Type |
|---|---|
| string | String |
| number | Float64 |
| boolean | bool |
| bigint | BigInt |
| object | JSValue |
| symbol | JSValue |
| void | Unit |
| undefined | JSValue |
| any | Any |
| unknown | JSValue |
| never | JSValue |
Example:
.d.ts code:
interface BasicTypes {
numberKeyword: number;
stringKeyword: string;
booleanKeyword: boolean;
bigintKeyword: bigint;
objectKeyword: object;
symbolKeyword: symbol;
voidKeyword: void;
undefinedKeyword: undefined;
anyKeyword: any;
unknownKeyword: unknown;
neverKeyword: never;
}
Generated Cangjie code:
public class BasicTypes {
protected BasicTypes(public var numberKeyword: Float64,
public var stringKeyword: String,
public var booleanKeyword: Bool,
public var bigintKeyword: BigInt,
public var objectKeyword: JSValue/* FIXME: `object` */,
public var symbolKeyword: JSValue/* FIXME: `symbol` */,
public var voidKeyword: Unit,
public var undefinedKeyword: JSValue/* FIXME: `undefined` */,
public var anyKeyword: Any,
public var unknownKeyword: JSValue/* FIXME: `unknown` */,
public var neverKeyword: JSValue/* FIXME: `never` */) {}
public func toJSValue(context: JSContext): JSValue {
let obj = context.object()
obj["numberKeyword"] = numberKeyword.toJSValue(context)
obj["stringKeyword"] = stringKeyword.toJSValue(context)
obj["booleanKeyword"] = booleanKeyword.toJSValue(context)
obj["bigintKeyword"] = context.bigint(bigintKeyword).toJSValue(context)
obj["objectKeyword"] = objectKeyword
obj["symbolKeyword"] = symbolKeyword
obj["voidKeyword"] = voidKeyword.toJSValue(context)
obj["undefinedKeyword"] = undefinedKeyword
obj["anyKeyword"] = anyKeyword.toJSValue(context)
obj["unknownKeyword"] = unknownKeyword
obj["neverKeyword"] = neverKeyword
obj.toJSValue()
}
public static func fromJSValue (context: JSContext, input: JSValue): BasicTypes {
let obj = input.asObject()
BasicTypes(
Float64.fromJSValue(context, obj["numberKeyword"]),
String.fromJSValue(context, obj["stringKeyword"]),
Bool.fromJSValue(context, obj["booleanKeyword"]),
obj["bigintKeyword"].asBigInt(context).toBigInt(),
JSValue/* FIXME: `object` */.fromJSValue(context, obj["objectKeyword"]),
JSValue/* FIXME: `symbol` */.fromJSValue(context, obj["symbolKeyword"]),
Unit.fromJSValue(context, obj["voidKeyword"]),
JSValue/* FIXME: `undefined` */.fromJSValue(context, obj["undefinedKeyword"]),
Any.fromJSValue(context, obj["anyKeyword"]),
JSValue/* FIXME: `unknown` */.fromJSValue(context, obj["unknownKeyword"]),
JSValue/* FIXME: `never` */.fromJSValue(context, obj["neverKeyword"])
)
}
}
Array
Currently supports four types of array conversions:
| ArkTS Type | Cangjie Type |
|---|---|
| Uint8Array | Array<UInt8> |
| ArrayBuffer | Array<UInt8> |
| Float32Array | Array<Float32> |
Basic type arrays (e.g., number[]) | Array<Float64> |
Example:
.d.ts code:
interface arrayInterface {
arrayType1: number[];
arrayType2: Uint8Array;
arrayType3: ArrayBuffer;
arrayType4: Float32Array;
}
Generated Cangjie code:
public class arrayInterface {
protected arrayInterface(public var arrayType1: Array<Float64>,
public var arrayType2: Array<UInt8>,
public var arrayType3: Array<UInt8>,
public var arrayType4: Array<Float32>) {}
public func toJSValue(context: JSContext): JSValue {
let obj = context.object()
obj["arrayType1"] = toJSArray(context, arrayType1)
obj["arrayType2"] = toJSArray(context, arrayType2, { ctx: JSContext, val: UInt8 => val.toJSValue(ctx) })
obj["arrayType3"] = toJSArray(context, arrayType3, { ctx: JSContext, val: UInt8 => val.toJSValue(ctx) })
obj["arrayType4"] = toJSArray(context, arrayType4, { ctx: JSContext, val: Float32 => val.toJSValue(ctx) })
obj.toJSValue()
}
public static func fromJSValue (context: JSContext, input: JSValue): arrayInterface {
let obj = input.asObject()
arrayInterface(
fromJSArray < Float64 >(context, obj["arrayType1"]),
fromJSArray(context, obj["arrayType2"], { ctx: JSContext, val: JSValue => UInt8.fromJSValue(ctx, val) }),
fromJSArray(context, obj["arrayType3"], { ctx: JSContext, val: JSValue => UInt8.fromJSValue(ctx, val) }),
fromJSArray(context, obj["arrayType4"], { ctx: JSContext, val: JSValue => Float32.fromJSValue(ctx, val) })
)
}
}
Function Types
- Supports interface properties and function parameters.
- The
Functiontype does not support conversion.
Interface Properties
Example:
.d.ts code:
interface TestListener {
"onStart"?: () => void;
"onDestroy"?: () => void;
onError?: (code: ErrorCode, msg: string) => void;
onTouch?: () => void;
onEvent?: (e: EventType) => void;
}
Generated Cangjie code:
public class TestListener {
protected TestListener(public var onStart!: Option<() -> Unit> = None,
public var onDestroy!: Option<() -> Unit> = None,
public var onError!: Option<(code: ErrorCode, msg: String) -> Unit> = None,
public var onTouch!: Option<() -> Unit> = None,
public var onEvent!: Option<(e: EventType) -> Unit> = None) {}
public func toJSValue(context: JSContext): JSValue {
let obj = context.object()
if(let Some(v) <- onStart) {
obj["onStart"] = context.function({ ctx, _ =>
v()
ctx.undefined().toJSValue()
}).toJSValue()
}
if(let Some(v) <- onDestroy) {
obj["onDestroy"] = context.function({ ctx, _ =>
v()
ctx.undefined().toJSValue()
}).toJSValue()
}
if(let Some(v) <- onError) {
obj["onError"] = context.function({ ctx, info =>
let p0 = ErrorCode.fromJSValue(ctx, info[0])
let p1 = String.fromJSValue(ctx, info[1])
v(p0, p1)
ctx.undefined().toJSValue()
}).toJSValue()
}
if(let Some(v) <- onTouch) {
obj["onTouch"] = context.function({ ctx, _ =>
v()
ctx.undefined().toJSValue()
}).toJSValue()
}
if(let Some(v) <- onEvent) {
obj["onEvent"] = context.function({ ctx, info =>
let p0 = EventType.parse(Int32.fromJSValue(ctx, info[0]))
v(p0)
ctx.undefined().toJSValue()
}).toJSValue()
}
obj.toJSValue()
}
public static func fromJSValue (context: JSContext, input: JSValue): TestListener {
let obj = input.asObject()
TestListener(
onStart: if(obj["onStart"].isUndefined()) {
None
} else {
{ =>
checkThreadAndCall < Unit >(context, { _ =>
obj["onStart"].asFunction().call()
})
}
},
onDestroy: if(obj["onDestroy"].isUndefined()) {
None
} else {
{ =>
checkThreadAndCall < Unit >(context, { _ =>
obj["onDestroy"].asFunction().call()
})
}
},
onError: if(obj["onError"].isUndefined()) {
None
} else {
{ code: ErrorCode, msg: String =>
checkThreadAndCall < Unit >(context, { ctx =>
let arg0 = code.toJSValue(ctx)
let arg1 = msg.toJSValue(ctx)
obj["onError"].asFunction().call([arg0, arg1])
})
}
},
onTouch: if(obj["onTouch"].isUndefined()) {
None
} else {
{ =>
checkThreadAndCall < Unit >(context, { _ =>
obj["onTouch"].asFunction().call()
})
}
},
onEvent: if(obj["onEvent"].isUndefined()) {
None
} else {
{ e: EventType =>
checkThreadAndCall < Unit >(context, { ctx =>
let arg0 = e.get().toJSValue(ctx)
obj["onEvent"].asFunction().call([arg0])
})
}
}
)
}
}
Function Parameters
Example:
.d.ts code:
interface MyListener {
on(key: string, param: boolean, cb: (r: Record<string, string>) => void);
}
Generated Cangjie code:
public class MyListener {
let callbackManager = CallbackManager < String, JSValue >()
protected MyListener(let arkts_object: JSObject) {}
/**
* @brief on(key: string,param: boolean,cb: (r: Record<string, string>) => void): void
*/
public func on(key: String, param: Bool, cb: Callback1Argument<Record>): Unit {
let key = key.toString()
if(callbackManager.findCallbackObject(key, cb).isSome()) {
return
}
let jsCallback = checkThreadAndCall < JSValue >(getMainContext()) {
__ctx => __ctx.function {
__ctx: JSContext, info: JSCallInfo =>
let arg0 = Record<stringstring>.fromJSValue(__ctx, info[0])
cb.invoke(arg0)
__ctx.undefined().toJSValue()
}.toJSValue()
}
callbackManager.put(key,(cb, jsCallback))
jsObjApiCall < Unit >( arkts_object, "on", { __ctx =>[key.toJSValue(__ctx), param.toJSValue(__ctx), jsCallback] })
}
func toJSValue(context: JSContext): JSValue {
arkts_object.toJSValue()
}
static func fromJSValue (context: JSContext, input: JSValue): MyListener {
MyListener(input.asObject())
}
}
Optional Types
Example:
.d.ts code:
interface Optionals {
optionalField1?: number;
optionalParam10: (a: number, b?: string) => void;
}
Generated Cangjie code:
public class Optionals {
protected Optionals(public var optionalParam10: (a: Float64, b: ?String) -> Unit,
public var optionalField1!: Option<Float64> = None) {}
public func toJSValue(context: JSContext): JSValue {
let obj = context.object()
obj["optionalParam10"] = context.function({ ctx, info =>
let p0 = Float64.fromJSValue(ctx, info[0])
let p1 = String.fromJSValue(ctx, info[1])
optionalParam10(p0, p1)
ctx.undefined().toJSValue()
}).toJSValue()
if(let Some(v) <- optionalField1) {
obj["optionalField1"] = v.toJSValue(context)
}
obj.toJSValue()
}
public static func fromJSValue (context: JSContext, input: JSValue): Optionals {
let obj = input.asObject()
Optionals(
{ a: Float64, b:?String =>
checkThreadAndCall < Unit >(context, { ctx =>
let arg0 = a.toJSValue(ctx)
let arg1 = b?.toJSValue(ctx)
obj["optionalParam10"].asFunction().call([arg0, arg1])
})
},
optionalField1: if(obj["optionalField1"].isUndefined()) {
None
} else {
Float64.fromJSValue(context, obj["optionalField1"])
}
)
}
}
Object Types
Example:
.d.ts code:
interface ObjectTypes<U, T> {
typeLiteral10: { x: number; y: U; };
typeLiteral20: { [p: number]: string; [p: symbol]: T };
typeLiteral30: { (): void; (number): string };
}
Current type is not supported and will be converted to JSValue by default.
Generated Cangjie code:
public class ObjectTypes<U, T> {
protected ObjectTypes(public var typeLiteral10: JSValue/* FIXME: `{ x: number; y: U }` */,
public var typeLiteral20: JSValue/* FIXME: `{ [number]: string; [symbol]: T }` */,
public var typeLiteral30: JSValue/* FIXME: `{ () => void; (number: any) => string }` */) {}
public func toJSValue(context: JSContext): JSValue {
let obj = context.object()
obj["typeLiteral10"] = typeLiteral10
obj["typeLiteral20"] = typeLiteral20
obj["typeLiteral30"] = typeLiteral30
obj.toJSValue()
}
public static func fromJSValue <U, T>(context: JSContext, input: JSValue): ObjectTypes<U, T> {
let obj = input.asObject()
ObjectTypes(
JSValue/* FIXME: `{ x: number; y: U }` */.fromJSValue(context, obj["typeLiteral10"]),
JSValue/* FIXME: `{ [number]: string; [symbol]: T }` */.fromJSValue(context, obj["typeLiteral20"]),
JSValue/* FIXME: `{ () => void; (number: any) => string }` */.fromJSValue(context, obj["typeLiteral30"])
)
}
}
Tuple Types
Example:
.d.ts code:
tupleType: [number, number, string];
Generated Cangjie code:
public var tupleType: Tuple<Float64, Float64, String>
Union Types
- Currently only supports union types as type aliases and function parameters.
Example:
.d.ts code:
type ARK1 = null | number | string | boolean | Uint8Array | Float32Array | bigint;
Generated Cangjie code:
public enum ARK1 {
| NULL
| NUMBER(Float64)
| STRING(String)
| BOOLEAN(Bool)
| BYTEARRAY(Array<UInt8>)
| FLOAT32ARRAY(Array<Float32>)
| BIGINT(BigInt)
public func toJSValue(context: JSContext): JSValue {
match(this) {
case NULL => context.null().toJSValue()
case NUMBER(x) => context.number(x).toJSValue()
case STRING(x) => context.string(x).toJSValue()
case BOOLEAN(x) => context.boolean(x).toJSValue()
case BYTEARRAY(x) => context.global["Uint8Array"].asClass().new(x.toJSValue(context))
case FLOAT32ARRAY(x) => let buffer = context.arrayBuffer(acquireArrayRawData(x), x.size, { pt => releaseArrayRawData(pt)})
context.global["Float32Array"].asClass().new(buffer.toJSValue())
case BIGINT(x) => context.bigint(x).toJSValue()
}
}
}
public enum ARK2 {
| ARK1(x)
| VOID(Unit)
public func toJSValue(context: JSContext): JSValue {
match(this) {
case ARK1(x) => x.toJSValue(context)
}
}
}
Promise Types
.d.ts code:
typeReference21: Promise<T>;
Generated Cangjie code:
public var typeReference21: Promise<T>,
Intersection Types
.d.ts code:
interface IntersectionTypes<U, T> {
intersectionType: object & Record<U, T>;
}
Current type is not supported and will be converted to JSValue by default.
Generated Cangjie code:
public class IntersectionTypes<U, T> {
protected IntersectionTypes(public var intersectionType: JSValue/* FIXME: `object & HashMap<U, T>` */) {}
public func toJSValue(context: JSContext): JSValue {
let obj = context.object()
obj["intersectionType"] = intersectionType
obj.toJSValue()
}
public static func fromJSValue <U, T>(context: JSContext, input: JSValue): IntersectionTypes<U, T> {
let obj = input.asObject()
IntersectionTypes(
JSValue/* FIXME: `object & HashMap<U, T>` */.fromJSValue(context, obj["intersectionType"])
)
}
}
Imports
- Currently imports will be translated, but manual confirmation and modification by users is still required.
.d.ts code:
import { a } from '@umeng/common';
import buffer from '@ohos.buffer';
import { e } from "../g/h";
import { MyStringEnum, MyNumericEnum } from './exportAlias'; // Type Imports
declare const value1: MyStringEnum;
declare const value2: MyNumericEnum;
import * as Inheritances from './inheritances'; // Module Import
declare function createSub(): Inheritances.SubClass;
import { ExportedInterface } from './exportAlias'; // Module Augmentation
declare module './exportAlias' {
interface ExportedInterface {
myOption?: string;
}
}
Generated Cangjie Code:
/***********IMPORT***********/
/*FIXME: Import details need to be verified and rewritten by user.*/
/*import { a } from '@umeng/common';*/
/*FIXME: Import details need to be verified and rewritten by user.*/
/*import buffer from '@ohos.buffer';*/
/*FIXME: Import details need to be verified and rewritten by user.*/
/*import { e } from "../g/h";*/
/*FIXME: Import details need to be verified and rewritten by user.*/
/*import { MyStringEnum, MyNumericEnum } from './exportAlias';*/
/*FIXME: Import details need to be verified and rewritten by user.*/
/*import * as Inheritances from './inheritances';*/
/*FIXME: Import details need to be verified and rewritten by user.*/
/*import { ExportedInterface } from './exportAlias';*/
/*
public const value1 = 0/* FIXME: Initialization is required */
*/
/*
public const value2 = 0/* FIXME: Initialization is required */
*/
/***********METHOD***********/
/**
* @brief createSub(): Inheritances.SubClass
*/
public func createSub(): JSValue/* FIXME: `Inheritances.SubClass` */ {
hmsGlobalApiCall < JSValue/* FIXME: `Inheritances.SubClass` */ >( "my_module_imports", "createSub", emptyArg) {
ctx, info => info
}
}
/***********OBJECT***********/
/*interface ExportedInterface {
myOption?: String;
}*/
public open class ExportedInterface {
protected ExportedInterface(public var myOption!: Option<String> = None) {}
public open func toJSValue(context: JSContext): JSValue {
let obj = context.object()
if(let Some(v) <- myOption) {
obj["myOption"] = v.toJSValue(context)
}
obj.toJSValue()
}
public static func fromJSValue(context: JSContext, input: JSValue): ExportedInterface {
let obj = input.asObject()
ExportedInterface(
myOption: Option < String >.fromJSValue(context, obj["myOption"])
)
}
}
Rules for Converting C Language to Cangjie Glue Code
HLE automatically generates glue code from C to Cangjie, supporting the translation of functions, structures, enums, and global variables. Type support includes: basic types, structure types, pointers, arrays, and strings.
Basic Types
The tool supports the following basic types:
| C Type | Cangjie Type |
|---|---|
| void | unit |
| NULL | CPointer |
| bool | Bool |
| char | UInt8 |
| signed char | Int8 |
| unsigned char | UInt8 |
| short | Int64 |
| int | Int32 |
| unsigned int | UInt32 |
| long | Int64 |
| unsigned long | UInt64 |
| long long | Int64 |
| unsigned long long | UInt64 |
| float | Float32 |
| double | Float64 |
| int arr[10] | Varry |
Complex Types
The tool supports complex types including: struct types, pointer types, enum types, strings, and arrays.
Struct Types
.h declaration file:
struct Point {
struct {
int x;
int y;
};
int z;
};
struct Person {
int age;
};
typedef struct {
long long x;
long long y;
long long z;
} Point3D;
The corresponding generated glue code is as follows:
@C
public struct _cjbind_ty_1 {
public let x: Int32
public let y: Int32
public init(x: Int32, y: Int32) {
this.x = x
this.y = y
}
}
@C
public struct Point {
public let __cjbind_anon_1: _cjbind_ty_1
public let z: Int32
public init(__cjbind_anon_1: _cjbind_ty_1, z: Int32) {
this.__cjbind_anon_1 = __cjbind_anon_1
this.z = z
}
}
@C
public struct Person {
public let age: Int32
public init(age: Int32) {
this.age = age
}
}
@C
public struct Point3D {
public let x: Int64
public let y: Int64
public let z: Int64
public init(x: Int64, y: Int64, z: Int64) {
this.x = x
this.y = y
this.z = z
}
}
Pointer Types
.h declaration file:
void* testPointer(int a);
The generated glue code is as follows:
foreign func testPointer(a: Int32): CPointer<Unit>
Function Types
.h declaration file:
void test(int a);
The corresponding generated glue code is as follows:
foreign func test(a: Int32): Unit
Enumeration Types
.h declaration file:
enum Color {
RED,
GREEN,
BLUE = 5,
YELLOW
};
The generated glue code is as follows:
public const Color_RED: Color = 0
public const Color_GREEN: Color = 1
public const Color_BLUE: Color = 5
public const Color_YELLOW: Color = 6
public type Color = UInt32
String
.h declaration file:
void test(char* a);
The corresponding generated glue code is as follows:
foreign func test(a: CString): Unit
Global Variables
Currently, only constants of basic types in C are supported.
.h header file declaration:
const int GLOBAL_CONST = 42;
The corresponding generated glue code is as follows:
public const GLOBAL_CONST: Int32 = 42
Array Type
.h declaration file:
void test(int arr[3]);
The corresponding generated glue code is as follows:
foreign func test(arr: VArray<Int32, $3>): Unit
Unsupported Specifications
Unsupported specifications include: bit fields, unions, macros, opaque types, flexible arrays, extended types
Bit Fields
.h declaration file:
struct X {
unsigned int isPowerOn : 1;
unsigned int hasError : 1;
unsigned int mode : 2;
unsigned int reserved : 4;
};
The generated corresponding Cangjie code is as follows, and the user needs to manually correct it:
@C
public struct X {
let _cjbind_opaque_blob: UInt32
public init() {
this._cjbind_opaque_blob = unsafe { zeroValue<UInt32>() }
}
}
Union
.h declaration file:
union X {
int a;
void* ptr;
};
The generated corresponding Cangjie code is as follows, and the user needs to manually correct it:
@C
public struct X {
let _cjbind_opaque_blob: UInt64
public init() {
this._cjbind_opaque_blob = unsafe { zeroValue<UInt64>() }
}
}
Macros
Currently, Cangjie does not have a suitable expression parsing library, so it cannot directly compute the value of macros. When encountering a macro, the current implementation will skip the entire #define.
Opaque Types
.h declaration file:
typedef struct OpaqueType OpaqueType;
OpaqueType* create_opaque(int initial_value);
void set_value(OpaqueType* obj, int value);
int get_value(OpaqueType* obj);
void destroy_opaque(OpaqueType* obj);
The generated corresponding Cangjie code is as follows, and the user needs to manually correct it:
@C
public struct OpaqueType {
init() {
throw Exception("This type should be implemented by user")
}
}
foreign func create_opaque(initial_value: Int32): CPointer<OpaqueType>
foreign func set_value(obj: CPointer<OpaqueType>, value: Int32): Unit
foreign func get_value(obj: CPointer<OpaqueType>): Int32
foreign func destroy_opaque(obj: CPointer<OpaqueType>): Unit
Flexible Arrays
.h declaration file:
typedef struct {
int length;
char data[];
} FlexibleString;
The generated corresponding Cangjie code is as follows, and the user needs to manually correct it:
@C
public struct FlexibleString {
public let length: Int32
public let data: CPointer<UInt8>
public init(length: Int32, data: CPointer<UInt8>) {
this.length = length
this.data = data
}
}
Extension Types
.h declaration file:
#include <complex.h>
#include <stdatomic.h>
float _Complex c_float;
double _Complex c_double;
long double _Complex c_ld;
long double pi_high = 3.14159265358979323846264338327950288L;
long double Planck_constant = 6.62607015e-34L;
_Atomic(int) counter = 0;
The generated corresponding Cangjie code is as follows, and the user needs to manually correct it:
/*FIXME: Non-constant global variable details need to be verified and rewritten by user.*/
/* float _Complex c_float */
/*FIXME: Non-constant global variable details need to be verified and rewritten by user.*/
/* double _Complex c_double */
/*FIXME: Non-constant global variable details need to be verified and rewritten by user.*/
/* long double _Complex c_ld */
/*FIXME: Non-constant global variable details need to be verified and rewritten by user.*/
/* long double pi_high = 3.14159265358979323846264338327950288L */
/*FIXME: Non-constant global variable details need to be verified and rewritten by user.*/
/* long double Planck_constant = 6.62607015e-34L */
/*FIXME: Non-constant global variable details need to be verified and rewritten by user.*/
/* _Atomic ( int ) counter = 0 */
Explanation:
- Memory Alignment: Cangjie does not provide syntax for alignment control, so the HLE tool uses the default C alignment. If the C code uses
#pragma packor__attribute__((packed))to control alignment, the generated binding code cannot be guaranteed to be correct.- Calling Conventions: The Cangjie documentation does not clearly describe calling conventions. In fact, the default calling convention is used. Currently, the HLE tool will try to infer the calling convention based on the function signatures in the C code, but correctness cannot be guaranteed.
- Usage limitation: Generating the glue code automatically by HLE from C to Cangjie is subject to the system glibc version limit, and currently only supports Ubuntu 22.04 and above.
Cangjie Language Server User Guide
Overview
The Cangjie Language Server provides language service features such as definition navigation, reference lookup, and code completion based on the Cangjie language.
Usage Instructions
The Cangjie Language Server serves as the backend server for providing Cangjie language services in IDEs, requiring integration with an IDE client. Developers can use it with the VSCode plugin released by Cangjie or develop their own IDE clients that comply with the Language Server Protocol (LSP).
The startup parameters for Cangjie Language Server are as follows:
-V Optional parameter, enables crash log generation capability for LSPServer
--enable-log=<value> Optional parameter, controls whether to enable log printing. If not set, defaults to true (enabling log printing)
--log-path=<value> Optional parameter, specifies the directory for generating log files and crash logs. If not set, logs will be generated in the LSPServer's working directory by default
--disableAutoImport Optional parameter, disables automatic package import during code completion
--test Optional parameter, starts test mode for running LSPServer test cases
Usage Example
LSPServer.exe --enable-log=true --log-path=D:/CangjieLSPLog -V --disableAutoImport
This command starts LSPServer with logging and crash log generation enabled, sets the log file output directory to D:/CangjieLSPLog, and disables automatic package import during code completion.
CHIR Deserialization Tool
Overview
chir-dis is a CHIR deserialization tool provided by Cangjie, designed to deserialize compiler-output CHIR serialized information into human-readable text files for storage.
Usage Instructions
Run chir-dis -h to view command usage:
A tool used to deserialize and dump CHIR.
Overview: chir-dis xxx.chir -> xxx.chirtxt
Usage:
chir-dis [option] file
Options:
-v print compiler version information.
-h print this help.
Developers can use this tool to deserialize a single CHIR serialized file and save it in the current directory with a .chirtxt extension. The -v option allows viewing the corresponding compiler version.
Usage Example
To deserialize the compiler-output CHIR serialized file package.chir into a readable text file for inspection, execute the following command:
chir-dis package.chir
After running the above command, a text file named package.chirtxt will be generated in the current directory.