tH-Wiki will provide an easy solution to having my personal wiki, task and issue tracker – all in one. YouTrack would fit closest (never tried their Knowledge Base feature, but back then the issue tracker was nice), but I do not want proprietary software.
Wiki
Issues
Attachments
Administration
Other
General
- Development to put it to GitHub later, maybe
- ⇒ State: …not maybe, but for sure 🥳
- API first
- For this project, I will separate backend and frontend into two independent applications (not build an all-in-one JAR)
- Abstraction of persistence, so we may start with file-based (json files, H2, SQLite),
but can easily extend to Postgres later
- ⇒ State: Partly implemented. Database uses JDBC for different engines, file-based with JSON is not possible.
However,
DemoDataInitializergoes towards that direction. Attachments useStorageinterface, which is just a wrapper around Java'sFileSystem.
- ⇒ State: Partly implemented. Database uses JDBC for different engines, file-based with JSON is not possible.
However,
- If not persisted that way, easy import/export with JSON or YAML
- ⇒ State: There is no specialized import/export feature yet. However, backups are easy when using H2 database, it's just "stop and copy the whole folder".
Wiki
- Simple input of data using Markdown
- Attachments (e.g. Screenshots or configuration files)
- Powerful full-text search (looking towards you, Lucene 😄)
- ⇒ State: Not yet. On client-side we have a powerful ANTLR query language, but it only works that good, because the client loads all texts of all wiki pages / issues.
- Hierarchical storage of pages
- E.g. Level 1 "Server", "Programming", "Personal" – Level 2 "Proxmox", "Kotlin", "Living Room"
- No fixed hierarchies (e.g. "Books" → "Pages"), but arbitrary depths of "folders"
- Tags as an additional mechanism to cluster content across different folders
Issue Tracker
- Basic features of an issue tracker: projects, issues, types, statuses, comments, relations
- ⇒ State: Comments not implemented yet. Edit is sufficient for now.
Task Management
- Should be an intermediate between Wiki and Issue Tracker
- Tasks need to be ordered into hierarchies like the Wiki
- Tasks need statuses/comments/relations like the Issue Tracker
- We need a "Due" field
⇒ State: A separate task management was implemented, but did not pan out. We have integrated everything we need into the issue tracker. Tasks are a separate issue type.
Notes Management
- Similar to Wiki and Tasks
- Notes could have a hierarchy to cluster them if needed.
- Some notes are just "immediate brain dumps" without any structure.
- Notes also can have tags (like in the Wiki)
- Notes are like Wiki, but not so structured. (Technically, they may be the same.)
⇒ State: There is no separate notes managements. For now, the issue tracker has a separate issue type for notes.
WikiPage
id:UUID(unique!)title:String(unique/non-empty)content:String(Markdown, can be blank)parent:UUID?(optional parent to form trees)creationTime:LocalDateTimemodificationTime:LocalDateTime
Attachment
id:UUID(unique! represents the filename we save the attachment in storage)wikiPageId/issueId:UUID?- an attachment belongs to either a wiki page or an issue (not none, not both!), so it can be referenced relative to it, e.g. an image in markdown
- but we keep options open for global attachments with a
NULLvalue
filename:String(original filename, we allow the same filename multiple times, but only once per same entry, so there is no clash in Markdown rendering)description:String(description or comment, the user adds to the upload)lastModifiedTime:Datetime?(file's mtime, can beNULLif unknown)uploadTime:Datetime(time of upload)size:Long(size of the file)mimeType:String(MIME type the client said, we don't validate for now, as detection is hard).- Can be empty if the client did not send any, in this case we will deliver with application/octet-stream
imageWidth/imageHeight:Int?(for images we save the dimensions of the image)sha256Sum:String(SHA-256 checksum as hex digits)
AttachmentDeleted
attachmentId:UUID(unique!)deletionTime:LocalDateTime
Project
id:UUID(unique!)prefix:String(prefix for issue key, we only allow uppercase characters)title:String(unique)description:String(description or comment)nextIssueNumber:Int(strictly monotonically increasing counter to build issue keys from, e.g.prefix= "DEMO",nextIssueNumber= 1 -> the next issue gets key "DEMO-1")
IssueType
id:UUID(unique!)title:String(unique)sortIndex:Inticon:StringiconColor:String
IssuePriority
id:UUID(unique!)title:String(unique)sortIndex:Inticon:StringiconColor:StringshowIconInList:Boolean
IssueStatus
id:UUID(unique!)title:String(unique)description:String(is used for tooltips to explain the status)sortIndex:Inticon:StringiconColor:StringdoneStatus:Boolean
Issue
id:UUID(unique!)projectId:UUID(an issue belongs to a project, we don't allow project-less issues)issueNumber:Int(unique per project, part of the issue key)- The issue key is project's
prefix+ "-" + issue'sissueNumber issueNumberis incremented with each new issue
- The issue key is project's
issueKey:String(issue key is saved redundantly, so there is no need to JOIN the project table)issueTypeId:UUID(issue's type, e.g. Feature, Bug, Task)issuePriorityId:UUID(each issue has a mandatory priority)issueStatusId:UUID(each issue has a status)- There are no status workflows like in JIRA. Not sure, we want that later.
- Later, statuses can be configured, but you can transition from any status to any other one.
title:String(non-empty, multiple issues with the same title are allowed)description:String(can be empty, Markdown)creationTime:LocalDateTimemodificationTime:LocalDateTimeprogress:Int?(can be set optionally, e.g. on issueType=Task)dueDate:LocalDate?(soon due/overdue issues will be highlighted in the UI)doneTime:LocalDateTime?(will be set when status goes to a status being doneStatus)
IssueLinkType
id:UUID(unique!)type:String(magic string for the UI to identify the different issue link types)- UI can use this to display different styles in the dependency graph, or different icons in lists.
- We don't use an enum to be easily open for future additions.
sortIndex:Intwording:String(Reading the link forward: "Issue 1 wording Issue 2")wordingInverse:String(Reading the link backward: "Issue 2 wordingInverse Issue 1")- Can be empty if the order of the issues does not matter, e.g. "relates to".
- In general, wording should be active (e.g. "blocks"), wordingInverse passive (e.g. "is blocked by").
IssueLink
id:UUID(unique!)issue1Id:UUIDissue2Id:UUIDissueLinkTypeId:UUID
Tag
id:UUID(unique!)projectId:UUID?(null = global tag, non-null = project tag, cannot be changed after creation)scope:String(can be empty)scopeIcon:String(can be empty)scopeColor:String(can be empty)title:StringtitleIcon:String(can be empty)titleColor:Stringdescription:String(can be empty)
TagAssociation
tagId:UUIDwikiPageIdUUID?(can be null, exactly one of those nullable UUID must be set)issueIdUUID?(can be null, exactly one of those nullable UUID must be set)
Entry is flexible to hold anything. The different presentations will be done by the UI, for example rendering a checkbox next to a task loading/saving custom field "done".
Because the "Powerful full-text search" must understand the content of the data, BE could/should
validate non-sense data like "type=wiki, due=2024-06-07". However, such examples are not hurtful
and could make sense, for example "Please check and rework this wiki page until 2024-06-07.".
With such an approach we could easily implement "convert note to task" by changing the type. We could (should?!) even keep the ID. UI is responsible for cleaning/enforcing certain custom fields, for example forcing/defaulting a status when "note ⇒ issue" and vice-versa delete the status when "issue ⇒ note".
Field definitions could be added later, for example assisting "Status" with a set of pre-defined values. Or "Due" having a "Date/Time" format.
⇒ Discontinued and replaced:
We refactored Entry back to WikiPage and Issue. Having these "flexible fields" made us more trouble
than it was worth. If we ever need a "super-object" again, we can build that with GraphQL and interfaces
or union types.
Projects serve as a basis for the issue tracker.
Previously, there was entries. Now, issues are disjunct to wiki pages, so it's debatable whether there will
be different wikis, one per project, or not. The "everything is an entry" did NOT serve us well, and was discontinued.
Tasks had been be replaced by issues, as the issue tracker will be able to handle all task use-cases. Notes and other
future extensions will be separate as well. Shared functionality like attachments, tags, or custom fields can be
implemented with different base tables like wiki_attachment and issue_attachment, or a single table having
separately columns like attachment.wiki_page_id and attachment.issue_id.
For projects we have both an id and the prefix which is unique.
It's not yet clear, whether we want frontend URLs like /issues/FOO-1 or issues/536215eb-a23c-4b14-a0ea-c68c4ada351a
(or both). To separate both the wording is "issue ID" (the UUID) vs. "issue key" (prefix + "-" + running number).
For the API, only the UUID is primarily relevant.
We won't (or only much later) track moving issues between projects like e.g. JIRA does. Since it's allowed to delete an issue, it's also allowed to move an issue from project A ("delete it there") to project B ("create it there").
sortIndex allows to arrange the items in a dropdown list. It's only used for sorting,
but never communicated to a client.
Issue type, priority, and status have an icon field. It's a reference to a Font-Awesome icon.
We use 48 chars as column length. The longest Font-Awesome icon currently has 32 characters.
Check https://github.com/FortAwesome/Font-Awesome/blob/6.x/js/all.js with regex \s+"[^"]{38,}":.
iconColor can either be a "#rrggbb" string for a specific color, or a Bootstrap CSS class suffix
like primary or danger-emphasis.
All three tables are automatically filled by the backend with default values. For now, there are no endpoints to alter these. We will later extend the functionality, so the data can be altered to provide more flexibility to the user.
Additional special fields are provided:
IssuePriority.showIconInList: Indicator to the UI to not show the icon in the list. It's so that "normal" priorities have no icon, only lower/higher priorities. In the issue's detail view the icon is always shown.IssueStatus.description: a description explaining what a status means. UI can render this as a tooltip to the status.IssueStatus.doneStatus: Marks a status as "done". When an issue transitions into such a status the issue'sdoneTimeis set. We use this to marking the end state, this can be for example "Done" or "Declined".
Tags are designed to be associable with all kinds of entities. For now, we have issues and wiki pages. It's possible to use them for attachment and future things.
There is project tags (associated with a particular project) and global tags (without a project).
By design, wiki pages can be associated with global tags. There is no special category "wiki tags". In theory, in a bigger setup, wiki pages would be associated with a project, i.e. forming one "wiki" per project. For now, we don't do that and only have one "global wiki". Thus it's consistent to use the global tags.
For association we decided to have foreign key constraints by the database, therefore each entity to be associable
with a tag has a column in the table. The alternative (tagId, type, targetId) looks easier, but does not allow
FKs. TagAssociation is the first table not having a surrogate key. We don't have to keep meta-data there and
consider a tag association "belonging" to the owning entity, e.g. if an issue gets deleted, so does their
tag associations, automatically.
There is a standard problem when working with database and filesystem: Database transactions ensure ACID, filesystem together with database does not.
Imagine the following scenario: In order to create a backup we read all attachments (let's say A, B, C) from the database. After reading the database, we copy the files in the filesystem to the backup location, that is the files for A, B, and C. If during this process, a user requests "delete attachment B", this request will delete the database row for B, and the file for B. The backup job won't see that row B had been deleted, because of database isolation. However, when the backup job tries to fetch the file for B, the file in the filesystem had already been deleted. We have a problem!
In order to compensate for this, we introduce AttachmentDeleted. It's sole purpose is to remember what files should
have been deleted already, but they are not. Deletion of files in the filesystem will be deferred to a garbage collection
job that will delete the file long after the backup has finished. This table does not need to hold any metadata for the
original Attachment. We don't use the table for any restore mechanism, but solely for deferring filesystem deletions.
To come back to the example: When the backup job still sees the (in another transaction deleted) B, it still can access the file for B. The file for B will be deleted later.
There is another decision we made here: Instead of a deleted flag in the Attachment table, we instead introduce
a separate table AttachmentDeleted. That has the advantage, that we do not need to adjust any query accessing
Attachment to filter out deleted entries.
- Complicated user, groups and permissions ⇒ First, it's only for me.
- We may start with fixed issue types/statuses/relations. Later this can be extended to be configurable, then even configurable per project.
- "Scrum"ish issue tracker with dashboards and reports
- We will start without repository/service layer, putting all code into the controller.
Additional layer/classes will only be inserted, if there is need for such code.
- Consequence: Testing will be done on HTTP level, testing the controllers directly.
- GraphQL API
- The initially used REST API has been replaced by a GraphQL API. Reasons:
- With the previous REST approach, the frontend did a lot of (copy&paste) multiple requests to get all the data, e.g. load issue types, issue priorities, and issue statuses to fill dropdowns or render an issue. GraphQL allows to fetch everything in one request.
- The frontend had different use cases to fetch a different subset of properties, e.g. for a list expensive
fields like
content/descriptionare not needed, when rendering a single issue thedescriptionwas fetched. Providing?fieldsfunctionality had proven to be quite annoying to implement, and had to be done endpoint by endpoint. GraphQL provides field-wise selection to the client out-of-the-box (yes, with the general work we had to do for GraphQL, once).- We do GraphQL right! In contrast to many implementations out there (cause of Hibernate, or even the Spring GraphQL docs showing examples with full objects), we don't load full objects and let the framework throw away a lot of data, but rather tailor our SQL queries to what's really requested by the GraphQL request.
- Important: Sometimes, ID columns must be loaded regardless whether there were requested!
Imagine the following query:
{ issues { title project { prefix } issueLinks { # id <-- without "id" requested, IssueLink.id is not loaded. issueLinkType { wording # <-- But without knowning IssueLink.id, how would we load the associated issueLinkType? # The DataFetcher would not find anything, wording would be null, which is incorrect! } } issueNumber } }
- GraphQL easily lets us return multiple errors at once, e.g. two missing fields.
(We could have done this with REST also, but GraphQL is immediately offering us
a) an
errorsproperty, b) an array for multiple errors)
- Deletion mutations will have an
idfield only. There is no need to return the removed object, so we don't have to fetch it first. - File uploads: GraphQL does not support file uploads. There is different workarounds available, see
1,
2,
3.
- Multipart Requests: Would open CSRF vulnerabilities. Spring does not have direct support, redirecting to a (as of now) only 15 stars project multipart-spring-graphql. ⇒ Nope
- Cloud services ⇒ Nooooope! 😱
- Base64 string uploads: This is the easiest solution, and since we are not expecting lots of traffic and/or
gigantic files, we go with this approach. No additional configuration needed, no deviation from the GraphQL spec.
- Remark: We could try out Base85 (needs a special "quotes/backslash to sth else" replacement to work in JSON data)
- File download: File uploads are done by Base64 strings. We don't go that way symmetrically for file downloads,
but rather provide the usual GET endpoints. These URLs or file names will be provided by GraphQL responses.
(❗) NOTE: This has to be re-evaluated. First, we deleted the REST API and with it the above mentioned download
functionality. Now, we re-introduced the GET endpoint. Users can access the attachment by its ID.
No separate URL/filename in the GraphQL response.
Reasons:
- Browsers can easily render a file with a
Content-Typeheader. With Base64 decode magic we would need to send theContent-Typeseparately, and put effort into letting the browser know. - No change to existing code necessary.
- Files can easily accessed outside the API by just typing in the URL into the browser.
- Browsers can easily render a file with a
- GraphQL types should match our database types. For example, having
issueKey(= project'sprefix+ "-" + issue'sissueNumber) needs an additional JOIN fromissuetoproject. This was bumpy in REST implementation already, and would make it even harder in GraphQL implementation. Solution is a) not provide the field anymore, or b) persist it redundantly in the database (asproject.issue_key). - "Testing will be done on HTTP level" also applies to GraphQL. We don't use Spring's
HttpGraphQlTester. - Naming:
- Mutations start with a verb.
- For mutations an input argument is just called
input. - A mutation can have different arguments as
input, but in general we don't want many arguments. If it's more than two, there should be ainputargument instead. However, we don't do it as strict as GitHub's API, they have all mutations only a singleinputargument.- Thoughts: An
inputtype can be extended more easily. We only choose separate arguments, when we are sure there won't be a change in the future. For example a delete mutation only accepting anidparameter. No need for a separateinputargument with only one field.
- Thoughts: An
- All mutations shall have a
...Responsetype as a result, not a direct result, likeProjectorDeletionResult. No two mutations may share the same type as a result. This is to allow us non-breaking schema changes in the future (by adding a new, deprecating an old field). - See https://www.apollographql.com/docs/graphos/schema-design/guides/naming-conventions
- Regarding "not found":
- Querying a non-existing entity is not an error when the schema specifies a nullable value.
No entry is added to
errors. - When referencing a non-existing entity as input in a mutation, it's an error.
errorswill contain aNOT_FOUNDclassified field (in contrast to other invalid input which is classifiedBAD_REQUEST, e.g. "number out of range", "date in invalid format").
- Querying a non-existing entity is not an error when the schema specifies a nullable value.
No entry is added to
DataLoaders do not necessarily go into the same file as their@SchemaMapping.@SchemaMappings are grouped by theirtypeName, e.g.@SchemaMapping(typeName = "WikiPage", field = "attachments")needs to go intoWikiPageController.DataLoaders go into "their" controller. That is because the controller knows how to resolve the fields. E.g.@SchemaMapping(typeName = "WikiPage", field = "attachments")(defined intoWikiPageController) wants to loadAttachments. The neededDataLoaderfor these is defined inAttachmentsController.
- General
Controllerlayout in order:initblock for definingDataLoaders and registering them intobatchLoaderRegistry@SchemaMapping(order like in schema definition,extend types last)@QueryMappings (order like in schema definition)@MutationMappings (order like in schema definition)determineFieldsToLoad()methods (fromDataFetchingFieldSelectionSet, fromBatchLoaderEnvironment, fromSet<String>)inner classDataLoaders usingdetermineFieldsToLoad()
- The initially used REST API has been replaced by a GraphQL API. Reasons:
- Issues are released to my production wiki since end of January 2025, and are performing extremely well 🥳 Starting right now (2025-02-09), issue references are used in commit messages if there is a corresponding issue in the tracker.
- Naming strategy for database
CONSTRAINTs andINDEXes:pk__table(PRIMARY KEY)fk__table__reference(FOREIGN KEY):referenceusually is the name of the referenced table, but can contain additional discriminators.uniq__table__columns(UNIQUE):columnsis the name(s) of the columns, separated by__index__table__columns(INDEX):columnsis the name(s) of the columns, separated by__check__table__param(CHECK):paramis a short description of the check- Generally,
- Column names can be shortened when it's clear, e.g.
taginstead oftag_id. - Multiple columns can be referenced by their meaning instead of listing all columns,
e.g.
keyinsteadproject_id__issue_number. - References can be shortened as well, e.g.
issue1instead ofissue__id. This is even mandatory when the same column is referenced twice. - If needed, use
__for separating table and columns, e.g.referenced_table__column_1vs.referenced_table__column_2.
- Column names can be shortened when it's clear, e.g.
- In controller, we use jOOQ
Recordclasses as an indirection, not the GraphQL POJO directly.- This helps, if some column/field in the database does not directly map to GraphQL POJO.
- GraphQL POJO classes should implement a static
fromRecord()function for this. - Common pattern:
@QueryMapping fun foos(fieldSelectionSet: DataFetchingFieldSelectionSet): List<Foo> { val fields = determineFieldsToLoad(fieldSelectionSet, emptySet()) return create .select(fields) .from(FOO) .fetchInto(FooRecord::class.java) .map { Foo.fromRecord(it) } }
Records help us updating an entity and without looking at the requested fields be safe by returningFoo.fromRecord(it), e.g.@MutationMapping fun updateFoo(@Argument input: UpdateFooInput): UpdateFooResponse { val errors = GraphQLErrors() val fooRecord = create .selectFrom(FOO) .where(FOO.ID.eq(input.id)) .fetchOne() ?: GraphQLErrors.throwApiExceptionWithNotFoundType(null, "There is no foo with ID '${input.id}'.") // ... validation errors.ifAnyThrowApiException() // updating the changed fields only fooRecord.text = input.text fooRecord.modificationTime = now create.executeUpdate(fooRecord) return UpdateFooResponse(Foo.fromRecord(fooRecord)) }
- Field selection is done by
determineFieldsToLoad()functions- They map each GraphQL field manually, e.g.
when { it == "id" -> listOf(ISSUE.ID) // ... }
- Children fields must be mapped, so that their
DataFetchers can load their association by Foreign Key, e.g.when { it == "nestedField" -> emptyList() it.startsWith("nestedField/") -> listOf(FOO.NESTED_FIELD_ID) // FK needed for children fields // ... }
whenblock with anelse -> throwwill ensure, we don't forget any field. Particular handy, when you overlook anextend type Foomapping the GraphQL schema.when { // ... else -> throw AssertionError("Unknown field '$it'. Schema should not allow that.") }
- They map each GraphQL field manually, e.g.
- Connect DBeaver to H2 database: dbeaver/dbeaver#20676 (comment)
Fixed logging configuration, checked into VCS, is configured within application.yaml.
Additional logging configuration can be done individually by environment variables. For example
LOGGING_LEVEL_BIZ_THEHACKER=DEBUG
LOGGING_LEVEL_ORG_JOOQ_TOOLS=DEBUG
to enable general DEBUG logging for packages biz.thehacker (so the whole tH-Wiki)
and org.jooq.tools (e.g. includes org.jooq.tools.LoggerListener to output all queries
and their results).
Note: It's not possible to configure specific loggers by environment variables that way, only full packages. See https://docs.spring.io/spring-boot/reference/features/logging.html#features.logging.log-levels.
Configuring individual loggers works like this:
SPRING_APPLICATION_JSON='{"logging.level.org.jooq.tools.LoggerListener": "DEBUG"}'
Simply change the src/main/resources/schema.sql file, then run
./gradlew jooqCodegenAfterwards, you can start/continue coding with the newly created files.
./gradlew bootJar
THWIKI_CORS_ORIGIN="http://localhost:5173" \
java \
-jar -Dspring.profiles.active=demo \
build/libs/th-wiki-1.0-SNAPSHOT.jar./gradlew bootJar
THWIKI_CORS_ORIGIN="http://localhost:5173" \
java \
-jar -Dspring.profiles.active=in-memory \
build/libs/th-wiki-1.0-SNAPSHOT.jar./gradlew bootJar
mkdir -p storage # ensure directory exists
THWIKI_CORS_ORIGIN="http://localhost:5173" \
SPRING_DATASOURCE_URL=jdbc:h2:file:./storage/th-wiki \
SPRING_SQL_INIT_MODE=ALWAYS \
java -jar build/libs/th-wiki-1.0-SNAPSHOT.jar./gradlew bootJar
THWIKI_CORS_ORIGIN="http://localhost:5173" \
java \
-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=*:5005 \
-jar -Dspring.profiles.active=demo \
build/libs/th-wiki-1.0-SNAPSHOT.jar
# Use "Attach to process" in your debugger(matches "in-memory" IntelliJ run configuration)
./gradlew bootJar
docker build -t th-wiki .
docker run --rm \
-p 8080:8080 \
-e THWIKI_CORS_ORIGIN="http://localhost:5173" \
-e THWIKI_STORAGE_PATH="" \
-e SPRING_PROFILES_ACTIVE="in-memory" \
th-wiki(matches "persisted" IntelliJ run configuration)
Note: /th-wiki/storage is already configured within the Docker image,
you just have to mount it with -v.
./gradlew bootJar
docker build -t th-wiki .
docker run --rm \
-p 8080:8080 \
-v /home/thehacker/IdeaProjects/th-wiki/th-wiki/storage:/th-wiki/storage \
-e SPRING_DATASOURCE_URL=jdbc:h2:file:/th-wiki/storage/th-wiki \
-e SPRING_SQL_INIT_MODE=ALWAYS \
-e THWIKI_CORS_ORIGIN=http://localhost:5173 \
-e THWIKI_BACKUP_SCHEDULE="0 30 3 * * SUN" \
-e THWIKI_BACKUP_RETENSION="keep-weekly=4,keep-monthly=12,keep-yearly=50" \
th-wikiOn my private projects I usually keep merge commits to have a cleaner history of the bigger features
split over multiple commits. Between bigger features there is usually a clean-up phase where I do smaller
improvements or refactoring not directly related to the previous feature. They either go into some
feature/misc branch or directly to the master.
With tH-Wiki I tried a 100% linear history, state-of-the-art nowadays. While this performs well in professional projects with big teams, it does not for private projects where there sometimes is breaks for months until work continues.
I regretted the decision. On 2025-02-08 I rewrote the history for the th-wiki and th-wiki-ui
repositories completely. Retroactively I reconstructed the branches and added the missing merge commits
by --no-ff merging. Since these commits are new, they got a fresh author date equal to 2025-02-08.
(I did not artificially set a fitting date, but rather kept the real one.) The other commits that
were cherry-picked only got their commit date changed, but kept their original author date by Git.












