Patreon Logo Support us on Patreon to keep GamingOnLinux alive. This ensures all of our main content remains free for everyone. Just good, fresh content! Alternatively, you can donate through PayPal Logo PayPal. You can also buy games using our partner links for GOG and Humble Store.
Latest Comments by YoRHa-2B
The Linux version of RUINER is now on GOG, with 50% off
18 Apr 2018 at 4:37 pm UTC Likes: 2

Heard a bit about this game, didn't really care, but GOG + native Linux version? Well that changes things.

DXVK, the Vulkan compatibility layer for Direct3D 11 and Wine has a fresh release reducing CPU overhead
13 Apr 2018 at 9:49 am UTC

@cRaZy-bisCuiT Since DXVK doesn't hot-patch or otherwise inject code into the process after the DLLs have been loaded (unlike tools like Reshade etc.), that shouldn't be too much of an issue with most anti-cheat solutions.

If something works with wined3d, it should be working with dxvk as well at least in that regard.

Wine 3.2 released with gamepad improvements and more Direct3D work
17 Feb 2018 at 8:14 am UTC

HID Gamepad support means that we don't need to use dumbxinputemu etc. anymore? If that's the case, great news.

VK9, the project to get Direct3D 9 applications to run with Vulkan reached another milestone
9 Jan 2018 at 10:58 pm UTC Likes: 5

Quoting: TheRiddickI can imagine it being great for NVIDIA users but RADV users have some issues with Vulkan performance under Linux, so until that happens the inbuilt Wine D3D9 methods are likely to be better for AMD users for a while.
I'm developing DXVK on RADV, and even though it tends to be slower than radeonsi in GPU-limited scenarios, it's quite reliable and has much lower CPU overhead than AMD's Windows driver, and CPU overhead is basically what needs to be minimized. Nier is quite a bit faster on Vulkan than on wined3d already (see https://imgur.com/a/Byrph [External Link] ), although it still isn't playable due to frequent crashes and GPU lockups.

Both projects may eventually implement something like CSMT as well in order to further improve performance. As others have mentioned, binding pipelines and updating descriptor sets etc. can be quite expensive.

Wine 2.21 is out with Direct 3D indirect draws support, also fixes for The Witcher 3 and NieR:Automata
15 Nov 2017 at 10:06 pm UTC

Quoting: wojtek88How does the NieR: Automata perform like? In Wine DB I can see it's silver. Does anyone managed to play the game?
The game has been working very well with wine-staging ever since the bug mentioned in the release notes was fixed.


This is on an old Phenom II X6 and an RX 480, mesa-git. 40+ FPS most of the time, only in some demanding areas it drops down into the 30s. My Xbox One pad also works fine when using dumbxinputemu [External Link] (don't even try to play this with mouse+keyboard).

On Mesa, especially with LLVM 5.0 or later, you may need to create some shader overrides (see spoiler below) in order to work around some nasty LLVM/shader compiler issues. Otherwise, the game will probably look like this [External Link].

Spoiler, click me
Save these shader files to /some/directory, then launch the game (or steam) with MESA_SHADER_READ_PATH=/some/directory set.

FS_feba74122c9590d1522b92bbec52d662ecd99012.glsl
#version 440
#extension GL_ARB_gpu_shader5 : enable
#extension GL_ARB_shader_atomic_counters : enable
#extension GL_ARB_shader_bit_encoding : enable
#extension GL_ARB_shader_image_load_store : enable
#extension GL_ARB_shader_image_size : enable
#extension GL_ARB_shader_storage_buffer_object : enable
#extension GL_ARB_shading_language_420pack : enable
#extension GL_ARB_shading_language_packing : enable
#extension GL_ARB_texture_cube_map_array : enable
#extension GL_ARB_texture_gather : enable
#extension GL_ARB_texture_query_levels : enable
#extension GL_ARB_uniform_buffer_object : enable
#extension GL_EXT_texture_array : enable
#extension GL_ARB_conservative_depth : enable
#extension GL_ARB_derivative_control : enable
#extension GL_ARB_explicit_attrib_location : enable
#extension GL_ARB_fragment_coord_conventions : enable
#extension GL_ARB_fragment_layer_viewport : enable
#extension GL_ARB_shader_texture_lod : enable
uniform vec4 ps_icb[4];
layout(std140, binding = 0) uniform block_ps_cb0 { vec4 ps_cb0[15]; };
layout(std140, binding = 10) uniform block_ps_cb10 { vec4 ps_cb10[23]; };
layout(std140, binding = 13) uniform block_ps_cb13 { vec4 ps_cb13[1]; };
layout(binding = 0)
uniform sampler2D ps_sampler0;
layout(binding = 1)
uniform sampler2DShadow ps_sampler1;
vec4 R0;
vec4 R1;
vec4 R2;
vec4 R3;
vec4 R4;
vec4 R5;
vec4 R6;
vec4 R7;
vec4 R8;
vec4 X0[4];
vec4 tmp0;
vec4 tmp1;
in shader_in_out {
 vec4 reg0;
 vec4 reg1;
 vec4 reg2;
 vec4 reg3;
 vec4 reg4;
 vec4 reg5;
 vec4 reg6;
 vec4 reg7;
 vec4 reg8;
 vec4 reg9;
 vec4 reg10;
 vec4 reg11;
 vec4 reg12;
 vec4 reg13;
 vec4 reg14;
 vec4 reg15;
 vec4 reg16;
 vec4 reg17;
 vec4 reg18;
 vec4 reg19;
 vec4 reg20;
 vec4 reg21;
 vec4 reg22;
 vec4 reg23;
 vec4 reg24;
 vec4 reg25;
 vec4 reg26;
 vec4 reg27;
 vec4 reg28;
 vec4 reg29;
 vec4 reg30;
 vec4 reg31;
} shader_in;
vec4 ps_in[32];
vec4 vpos;
layout(location = 0) out vec4 ps_out0;
layout(location = 1) out vec4 ps_out1;
layout(location = 2) out vec4 ps_out2;
layout(location = 3) out vec4 ps_out3;
layout(location = 4) out vec4 ps_out4;
layout(location = 5) out vec4 ps_out5;
layout(location = 6) out vec4 ps_out6;
layout(location = 7) out vec4 ps_out7;
void main() {
  vpos = gl_FragCoord;
  ps_in[0].xyzw = vpos.xyzw;
  ps_in[1].xy = shader_in.reg1.xy;
  ps_in[2].xyzw = shader_in.reg2.xyzw;
  R0.x = (texture(ps_sampler0, ps_in[1].xy).x);
  R0.x = ((R0.x * ps_cb13[0].y) + ps_cb13[0].x);
  R0.yz = (R0.xx * ps_in[2].xy);
  R1.xy = (R0.yz * ps_in[2].zw);
  R1.z = (-R0.x);
  R1.w = (uintBitsToFloat(0x3f800000u));
  R0.x = (dot(R1.xyzw, ps_cb0[12].xyzw));
  R0.y = (dot(R1.xyzw, ps_cb0[13].xyzw));
  R0.z = (dot(R1.xyzw, ps_cb0[14].xyzw));
  R0.w = (uintBitsToFloat(0x3f800000u));
  R1.x = (dot(R0.xyzw, ps_cb10[7].xyzw));
  R1.y = (dot(R0.xyzw, ps_cb10[8].xyzw));
  R1.w = (dot(R0.xyzw, ps_cb10[9].xyzw));
  R1.z = (dot(R0.xyzw, ps_cb10[10].xyzw));
  X0[0].xyw = (R1.xyz);
  R1.x = (R1.w / R1.z);
  R2.xyzw = (ps_cb10[0].xyzw * ps_cb10[1].xyzw);
  R1.x = ((-R2.x * uintBitsToFloat(0x3dcccccdu)) + R1.x);
  X0[0].z = (R1.x);
  R1.x = (dot(R0.xyzw, ps_cb10[11].xyzw));
  R1.y = (dot(R0.xyzw, ps_cb10[12].xyzw));
  R1.w = (dot(R0.xyzw, ps_cb10[13].xyzw));
  R1.z = (dot(R0.xyzw, ps_cb10[14].xyzw));
  X0[1].xyw = (R1.xyz);
  R1.x = (R1.w / R1.z);
  R1.x = ((-R2.y * uintBitsToFloat(0x3dcccccdu)) + R1.x);
  X0[1].z = (R1.x);
  R1.x = (dot(R0.xyzw, ps_cb10[15].xyzw));
  R1.y = (dot(R0.xyzw, ps_cb10[16].xyzw));
  R1.w = (dot(R0.xyzw, ps_cb10[17].xyzw));
  R1.z = (dot(R0.xyzw, ps_cb10[18].xyzw));
  X0[2].xyw = (R1.xyz);
  R1.x = (R1.w / R1.z);
  R1.x = ((-R2.z * uintBitsToFloat(0x3dcccccdu)) + R1.x);
  X0[2].z = (R1.x);
  R1.x = (dot(R0.xyzw, ps_cb10[19].xyzw));
  R1.y = (dot(R0.xyzw, ps_cb10[20].xyzw));
  R1.w = (dot(R0.xyzw, ps_cb10[21].xyzw));
  R1.z = (dot(R0.xyzw, ps_cb10[22].xyzw));
  X0[3].xyw = (R1.xyz);
  R0.x = (R1.w / R1.z);
  R0.x = ((-R2.w * uintBitsToFloat(0x3dcccccdu)) + R0.x);
  X0[3].z = (R0.x);
  R0.xyzw = (uintBitsToFloat(uvec4(0u, 0u, 0u, 0u)).xyzw);
  for (;;) {
    R1.x = uintBitsToFloat(floatBitsToUint(R0).y >= 0x4u ? 0xffffffffu : 0u);
//     if (bool(floatBitsToUint(R1).x))  // Original code generated by wine, shows artifacts
    if (floatBitsToUint(R0).y >= 0x4u)  // Technically the same condition as above, works properly
      break;
    R1.xyzw = (X0[floatBitsToInt(R0).y + 0].xyzw);
    R1.xy = (R1.xy / R1.ww);
    R1.z = (max(abs(R1.z), abs(R1.y)));
    R1.z = (max(R1.z, abs(R1.x)));
    R1.z = uintBitsToFloat(uintBitsToFloat(0x3f7d70a4u) >= R1.z ? 0xffffffffu : 0u);
    if (bool(floatBitsToUint(R1).z)) {
      R0.zw = (R1.xy);
      break;
    }
    R0.xy = intBitsToFloat(floatBitsToInt(R0).xy + ivec4(0x1, 0x1, 0, 0).xy);
    R0.zw = (R1.xy);
  }
  R0.y = uintBitsToFloat(floatBitsToUint(R0).x < 0x4u ? 0xffffffffu : 0u);
  if (bool(floatBitsToUint(R0).y)) {
    R0.y = (R0.z + uintBitsToFloat(0x3f800000u));
    R0.z = uintBitsToFloat(floatBitsToUint(R0).x & 0x1u);
    R0.z = (float(floatBitsToUint(R0).z));
    R0.z = (R0.z * uintBitsToFloat(0x3f000000u));
    R1.x = ((R0.y * uintBitsToFloat(0x3e800000u)) + R0.z);
    R0.y = (-R0.w + uintBitsToFloat(0x3f800000u));
    R0.z = uintBitsToFloat(floatBitsToUint(R0).x >> 0x1u);
    R0.z = (float(floatBitsToUint(R0).z));
    R0.z = (R0.z * uintBitsToFloat(0x3f000000u));
    R1.y = ((R0.y * uintBitsToFloat(0x3e800000u)) + R0.z);
    R1.z = (X0[floatBitsToInt(R0).x + 0].z);
    R0.x = (dot(ps_cb10[2].xyzw, ps_icb[floatBitsToInt(R0).x + 0].xyzw));
    R0.yzw = ((R0.xxx * uintBitsToFloat(uvec4(0u, 0x3f000000u, 0x3f000000u, 0u)).yzw) + R1.xyz);
    R2.xyz = ((R0.xxx * uintBitsToFloat(uvec4(0xbf000000u, 0xbf000000u, 0u, 0u)).xyz) + R1.xyz);
    R3.xyz = ((R0.xxx * uintBitsToFloat(uvec4(0xbf000000u, 0x3f000000u, 0u, 0u)).xyz) + R1.xyz);
    R4.xyz = ((R0.xxx * uintBitsToFloat(uvec4(0x3f000000u, 0xbf000000u, 0u, 0u)).xyz) + R1.xyz);
    R5.xyz = ((R0.xxx * uintBitsToFloat(uvec4(0xbfc00000u, 0x3f000000u, 0u, 0u)).xyz) + R1.xyz);
    R6.xyz = ((R0.xxx * uintBitsToFloat(uvec4(0x3fc00000u, 0xbf000000u, 0u, 0u)).xyz) + R1.xyz);
    R7.xyz = ((R0.xxx * uintBitsToFloat(uvec4(0xbf000000u, 0x3fc00000u, 0u, 0u)).xyz) + R1.xyz);
    R8.xyz = ((R0.xxx * uintBitsToFloat(uvec4(0x3f000000u, 0xbfc00000u, 0u, 0u)).xyz) + R1.xyz);
    R1.w = (vec4(textureLod(ps_sampler1, vec3(R1.xy, R1.z), 0)).w);
    R0.y = (vec4(textureLod(ps_sampler1, vec3(R0.yz, R0.w), 0)).y);
    R0.y = ((R0.y * uintBitsToFloat(0x3f4ccccdu)) + R1.w);
    R0.z = (vec4(textureLod(ps_sampler1, vec3(R2.xy, R2.z), 0)).z);
    R0.y = ((R0.z * uintBitsToFloat(0x3f4ccccdu)) + R0.y);
    R0.z = (vec4(textureLod(ps_sampler1, vec3(R3.xy, R3.z), 0)).z);
    R0.y = ((R0.z * uintBitsToFloat(0x3f4ccccdu)) + R0.y);
    R0.z = (vec4(textureLod(ps_sampler1, vec3(R4.xy, R4.z), 0)).z);
    R0.y = ((R0.z * uintBitsToFloat(0x3f4ccccdu)) + R0.y);
    R0.z = (vec4(textureLod(ps_sampler1, vec3(R5.xy, R5.z), 0)).z);
    R0.y = ((R0.z * uintBitsToFloat(0x3ee66666u)) + R0.y);
    R0.z = (vec4(textureLod(ps_sampler1, vec3(R6.xy, R6.z), 0)).z);
    R0.y = ((R0.z * uintBitsToFloat(0x3ee66666u)) + R0.y);
    R0.z = (vec4(textureLod(ps_sampler1, vec3(R7.xy, R7.z), 0)).z);
    R0.y = ((R0.z * uintBitsToFloat(0x3ee66666u)) + R0.y);
    R0.z = (vec4(textureLod(ps_sampler1, vec3(R8.xy, R8.z), 0)).z);
    R0.y = ((R0.z * uintBitsToFloat(0x3ee66666u)) + R0.y);
    R0.z = uintBitsToFloat(R0.y != uintBitsToFloat(0u) ? 0xffffffffu : 0u);
    if (bool(floatBitsToUint(R0).z)) {
      R2.xyz = ((R0.xxx * uintBitsToFloat(uvec4(0x3fc00000u, 0x3fc00000u, 0u, 0u)).xyz) + R1.xyz);
      R3.xyz = ((R0.xxx * uintBitsToFloat(uvec4(0xbfc00000u, 0xbfc00000u, 0u, 0u)).xyz) + R1.xyz);
      R4.xyz = ((R0.xxx * uintBitsToFloat(uvec4(0xbfc00000u, 0x3fc00000u, 0u, 0u)).xyz) + R1.xyz);
      R5.xyz = ((R0.xxx * uintBitsToFloat(uvec4(0x3fc00000u, 0xbfc00000u, 0u, 0u)).xyz) + R1.xyz);
      R6.xyz = ((R0.xxx * uintBitsToFloat(uvec4(0x3fc00000u, 0x3f000000u, 0u, 0u)).xyz) + R1.xyz);
      R7.xyz = ((R0.xxx * uintBitsToFloat(uvec4(0xbfc00000u, 0xbf000000u, 0u, 0u)).xyz) + R1.xyz);
      R8.xyz = ((R0.xxx * uintBitsToFloat(uvec4(0x3f000000u, 0x3fc00000u, 0u, 0u)).xyz) + R1.xyz);
      R0.xzw = ((R0.xxx * uintBitsToFloat(uvec4(0xbf000000u, 0u, 0xbfc00000u, 0u)).xzw) + R1.xyz);
      R1.x = (vec4(textureLod(ps_sampler1, vec3(R2.xy, R2.z), 0)).x);
      R1.x = ((R1.x * uintBitsToFloat(0x3e19999au)) + R0.y);
      R1.y = (vec4(textureLod(ps_sampler1, vec3(R3.xy, R3.z), 0)).y);
      R1.x = ((R1.y * uintBitsToFloat(0x3e19999au)) + R1.x);
      R1.y = (vec4(textureLod(ps_sampler1, vec3(R4.xy, R4.z), 0)).y);
      R1.x = ((R1.y * uintBitsToFloat(0x3e19999au)) + R1.x);
      R1.y = (vec4(textureLod(ps_sampler1, vec3(R5.xy, R5.z), 0)).y);
      R1.x = ((R1.y * uintBitsToFloat(0x3e19999au)) + R1.x);
      R1.y = (vec4(textureLod(ps_sampler1, vec3(R6.xy, R6.z), 0)).y);
      R1.x = ((R1.y * uintBitsToFloat(0x3ee66666u)) + R1.x);
      R1.y = (vec4(textureLod(ps_sampler1, vec3(R7.xy, R7.z), 0)).y);
      R1.x = ((R1.y * uintBitsToFloat(0x3ee66666u)) + R1.x);
      R1.y = (vec4(textureLod(ps_sampler1, vec3(R8.xy, R8.z), 0)).y);
      R1.x = ((R1.y * uintBitsToFloat(0x3ee66666u)) + R1.x);
      R0.x = (vec4(textureLod(ps_sampler1, vec3(R0.xz, R0.w), 0)).x);
      R0.x = ((R0.x * uintBitsToFloat(0x3ee66666u)) + R1.x);
      R0.x = (R0.x * uintBitsToFloat(0x3df3cf3du));
    } else {
    R0.x = (R0.y * uintBitsToFloat(0x3e2aaaabu));
    }
    R0.x = (-R0.x + uintBitsToFloat(0x3f800000u));
  } else {
    R0.x = (uintBitsToFloat(0x3f800000u));
  }
  ps_out0.xyzw = (R0.xxxx);
  return;
}


CS_2a215c714114a24e60cf40a87f5226cfa26a5df5.glsl
#version 430

layout(std140, binding = 76)
uniform block_cs_cb1 {
  vec4 unused;          
  vec2 img_size;        // Image size, in pixels
  vec2 num_workgroups;  // Number of CS workgroups (we use gl_NumWorkGroups instead)
};

layout(binding = 160)
uniform sampler2D cs_sampler0;  // Scaled scene image

layout(binding = 0)
writeonly uniform uimageBuffer cs_image0;  // Output buffer

shared float cs_g0[64];

layout(local_size_x = 1, local_size_y = 1, local_size_z = 1) in;

void main() {
  float sum = 0.0f;
  
  for (int y = 0; y < 8; y++) {
    for (int x = 0; x < 8; x++) {
      ivec2 coord = 8 * ivec2(gl_WorkGroupID.xy) + ivec2(x, y);
      if ((coord.x < img_size.x) && (coord.y < img_size.y)) {
        vec3 color = texelFetch(cs_sampler0, 8 * ivec2(gl_WorkGroupID.xy) + ivec2(x, y), 0).xyz;
        sum += dot(color, vec3(0.298912f, 0.586611f, 0.114478f));
      } else {
        sum += 1.0f;
      }
    }
  }

  imageStore(cs_image0,
    int(gl_WorkGroupID.y * gl_NumWorkGroups.x + gl_WorkGroupID.x),
    uvec4(floatBitsToUint(sum), 0, 0, 0));
}